[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN114731406A - Encoding method, decoding method, encoding device, and decoding device - Google Patents

Encoding method, decoding method, encoding device, and decoding device Download PDF

Info

Publication number
CN114731406A
CN114731406A CN202080078054.3A CN202080078054A CN114731406A CN 114731406 A CN114731406 A CN 114731406A CN 202080078054 A CN202080078054 A CN 202080078054A CN 114731406 A CN114731406 A CN 114731406A
Authority
CN
China
Prior art keywords
neural network
image
communication link
network model
encoding
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202080078054.3A
Other languages
Chinese (zh)
Other versions
CN114731406B (en
Inventor
周焰
郑萧桢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SZ DJI Technology Co Ltd
Original Assignee
SZ DJI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SZ DJI Technology Co Ltd filed Critical SZ DJI Technology Co Ltd
Publication of CN114731406A publication Critical patent/CN114731406A/en
Application granted granted Critical
Publication of CN114731406B publication Critical patent/CN114731406B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The application provides an encoding method, a decoding method, an encoding device and a decoding device, comprising: coding an image to be coded by utilizing a coding technology based on a neural network to obtain a code stream of the image to be coded; transmitting the code stream of the image to be coded through a first communication link; model parameters of a neural network model included in the neural network-based encoding technique are transmitted over a second communication link. The scheme provided by the application can well solve the problems of management and transmission of the model parameters of the neural network model; but also the bit consumption of the code stream is relieved; in addition, the challenges of low latency transmission due to the management and transmission requirements of the modeled parameters may be reduced.

Description

Encoding method, decoding method, encoding device, and decoding device
Copyright declaration
The disclosure of this patent document contains material which is subject to copyright protection. The copyright is owned by the copyright owner. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the patent and trademark office official records and records.
Technical Field
The present application relates to the field of image processing, and more particularly, to an encoding method, a decoding method, and an encoding apparatus, a decoding apparatus.
Background
At present, the research of the coding and decoding technology based on the neural network draws attention, and for the management and transmission of the model parameters based on the neural network model, one way is to fix the model parameters of the neural network model and use the model parameters as a common library file available to both the encoder and the decoder, but the effect of the way is reduced once some coding tools of the encoder are changed.
The other mode is that the model parameters of the neural network model are transmitted to a decoding end through a code stream, so that the model parameters can be flexibly adjusted according to the requirements of an encoder, but the parameter quantity is generally larger for a deep learning neural network model with deep hierarchy teaching, if the model parameters are directly put into the code stream for transmission, the bit consumption is increased, and the video compression ratio is reduced. In addition, encoding based on neural network encoding techniques presents a greater challenge to low-latency transmission of graph-borne scenes due to the management and transmission requirements of model parameters.
Disclosure of Invention
The application provides an encoding method, a decoding method, an encoding device and a decoding device, which can well solve the problems of management and transmission of model parameters of a neural network model; but also the bit consumption of the code stream is relieved; in addition, the challenges of low latency transmission due to the management and transmission requirements of the modeled parameters may be reduced.
In a first aspect, an encoding method is provided, including: coding an image to be coded by utilizing a coding technology based on a neural network to obtain a code stream of the image to be coded; transmitting the code stream of the image to be coded through a first communication link; model parameters of a neural network model included in the neural network-based encoding technique are transmitted over a second communication link.
In a second aspect, a decoding method is provided, including: receiving a code stream of an image to be decoded through a first communication link; receiving model parameters of the neural network model over a second communication link; and decoding the code stream by using the model parameters of the neural network model to obtain a decoded image.
In a third aspect, an encoding apparatus is provided, including: the processor is used for encoding the image to be encoded by utilizing an encoding technology based on a neural network so as to obtain a code stream of the image to be encoded; transmitting the code stream of the image to be coded through a first communication link; model parameters of a neural network model included in the neural network-based encoding technique are transmitted over a second communication link.
In a fourth aspect, a decoding apparatus is provided, which includes a decoder configured to receive a code stream of an image to be decoded via a first communication link; receiving model parameters of the neural network model over a second communication link; and decoding the code stream by using the model parameters of the neural network model to obtain a decoded image.
In a fifth aspect, an encoding apparatus is provided that includes a processor and a memory. The memory is used for storing a computer program, and the processor is used for calling and running the computer program stored in the memory, and executing the method in the first aspect or each implementation manner thereof.
In a sixth aspect, a decoding apparatus is provided that includes a processor and a memory. The memory is used for storing a computer program, and the processor is used for calling and running the computer program stored in the memory, and executing the method of the second aspect or each implementation mode thereof.
In a seventh aspect, a chip is provided for implementing the method in the first aspect or its implementation manners.
Specifically, the chip includes: a processor configured to call and run the computer program from the memory, so that the device on which the chip is installed performs the method according to the first aspect or the implementation manner thereof.
In an eighth aspect, a chip is provided for implementing the method of the second aspect or its implementation manners.
Specifically, the chip includes: a processor for calling and running the computer program from the memory so that the device on which the chip is installed performs the method of the second aspect or its implementations.
In a ninth aspect, there is provided a computer readable storage medium for storing a computer program comprising instructions for performing the method of the first aspect or any possible implementation manner of the first aspect.
A tenth aspect provides a computer readable storage medium for storing a computer program comprising instructions for performing the method of the second aspect or any possible implementation of the second aspect.
In an eleventh aspect, there is provided a computer program product comprising computer program instructions for causing a computer to perform the method of the first aspect or the implementation manners of the first aspect.
In a twelfth aspect, a computer program product is provided, comprising computer program instructions to make a computer execute the method of the second aspect or the implementation manner of the second aspect.
According to the scheme provided by the application, the coding end transmits the code stream of the image to be coded and the model parameters of the neural network model through the first communication link and the second communication link respectively, and the decoding end correspondingly receives the code stream and the model parameters of the neural network model through the two communication links respectively, so that the problems of management and transmission of the model parameters of the neural network model can be well solved; the model parameters of the neural network model are transmitted through the second communication link, so that the bit consumption of the code stream is relieved; in addition, the challenge of low latency transmission due to the management of model parameters and transmission requirements may also be reduced.
Drawings
FIG. 1 is an architecture diagram of a solution applying an embodiment of the present application;
FIG. 2 is a schematic diagram of a video coding framework 2 according to an embodiment of the present application;
FIG. 3 is a schematic flow chart of an encoding method provided by an embodiment of the present application;
FIG. 4 is a schematic flow chart diagram of a decoding method provided by an embodiment of the present application;
FIG. 5 is a schematic diagram of graph transmission of an intelligent coding technique provided by an embodiment of the present application;
fig. 6 is a schematic diagram of an encoding framework 2 provided in an embodiment of the present application;
FIG. 7 is a schematic flow chart of a method for training a neural network model according to an embodiment of the present disclosure;
fig. 8a is a schematic flowchart illustrating a video encoder applying an intelligent encoding technique according to an embodiment of the present application;
fig. 8b is a schematic flowchart illustrating a video decoder applying an intelligent encoding technique according to an embodiment of the present application;
fig. 9 is a schematic diagram of an encoding apparatus provided in an embodiment of the present application;
fig. 10 is a schematic diagram of a decoding apparatus provided in an embodiment of the present application;
fig. 11 is a schematic diagram of an encoding apparatus provided in another embodiment of the present application;
fig. 12 is a schematic diagram of a decoding apparatus provided in another embodiment of the present application;
fig. 13 is a schematic structural diagram of a chip provided in an embodiment of the present application.
Detailed Description
The following describes technical solutions in the embodiments of the present application.
Unless otherwise defined, all technical and scientific terms used in the examples of this application have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present application.
Fig. 1 is an architecture diagram of a solution to which an embodiment of the present application is applied.
As shown in FIG. 1, the system 100 can receive the data 102 to be processed, process the data 102 to be processed, and generate processed data 108. For example, the system 100 may receive data to be encoded, encoding the data to be encoded to produce encoded data, or the system 100 may receive data to be decoded, decoding the data to be decoded to produce decoded data. In some embodiments, the components in system 100 may be implemented by one or more processors, which may be processors in a computing device or in a mobile device (e.g., a drone). The processor may be any kind of processor, which is not limited in this embodiment of the present invention. In some possible designs, the processor may include an encoder, a decoder, a codec, or the like. One or more memories may also be included in the system 100. The memory may be used to store instructions and data, such as computer-executable instructions to implement aspects of embodiments of the invention, pending data 102, processed data 108, and the like. The memory may be any kind of memory, which is not limited in this embodiment of the present invention.
The data to be encoded may include text, images, graphical objects, animation sequences, audio, video, or any other data that needs to be encoded. In some cases, the data to be encoded may include sensory data from sensors, which may be visual sensors (e.g., cameras, infrared sensors), microphones, near-field sensors (e.g., ultrasonic sensors, radar), position sensors, temperature sensors, touch sensors, and so forth. In some cases, the data to be encoded may include information from the user, e.g., biometric information, which may include facial features, fingerprint scans, retinal scans, voice recordings, DNA samples, and the like.
Fig. 2 is a schematic diagram of a video coding framework 2 according to an embodiment of the present application. As shown in fig. 2, after receiving the video to be encoded, each frame of the video to be encoded is encoded in turn, starting from the first frame of the video to be encoded. Wherein, the current coding frame mainly passes through: and (3) processing Prediction (Prediction), transformation (Transform), Quantization (Quantization), Entropy Coding (encoding) and the like, and finally outputting the code stream of the current Coding frame. Correspondingly, the decoding process generally decodes the received code stream according to the inverse process of the above process to recover the video frame information before decoding.
Specifically, as shown in fig. 2, the video coding framework 2 includes a coding control module 201 for performing decision control actions and parameter selection during the coding process. For example, as shown in fig. 2, the encoding control module 201 controls parameters used in transformation, quantization, inverse quantization, and inverse transformation, and controls the selection of an intra mode or an inter mode, and the parameter control of motion estimation and filtering, and the control parameters of the encoding control module 201 are also input to the entropy encoding module and encoded to form a part of the encoded code stream.
When the current coding frame is coded, the coding frame is divided 202, specifically, the coding frame is divided into slices (slices) and then divided into blocks. Optionally, in an example, the Coding frame is divided into a plurality of non-overlapping largest Coding Tree Units (CTUs), each CTU may be further iteratively divided into a series of smaller Coding Units (CUs) in a quadtree manner, a binary Tree manner, or a ternary Tree manner, and in some examples, a CU may further include a Prediction Unit (PU) and a Transform Unit (TU) associated therewith, where PU is a Prediction basic Unit and TU is a Transform and quantization basic Unit. In some examples, a PU and a TU are each divided into one or more blocks on a CU basis, where a PU includes multiple Prediction Blocks (PBs) and associated syntax elements. In some examples, the PU and TU may be the same or derived from the CU by different partitioning methods. In some examples, at least two of the CU, PU, and TU are the same, e.g., without distinguishing the CU, PU, and TU, all are predicted, quantized, and transformed in units of CUs. For convenience of description, a CTU, CU, or other formed data unit is hereinafter referred to as an encoded block.
It should be understood that in the embodiments of the present application, the data unit for video coding may be a frame, a slice, a coding tree unit, a coding block or a group of any of the above. The size of the data units may vary in different embodiments.
Specifically, as shown in fig. 2, after the encoded frame is divided into a plurality of encoded blocks, a prediction process is performed to remove redundant information in spatial domain and temporal domain of the current encoded frame. The currently used prediction encoding methods include intra prediction and inter prediction. Intra-frame prediction uses only the reconstructed information in the current frame image to predict the current coding block, while inter-frame prediction uses information in other frame images (also called reference frames) that have been reconstructed before to predict the current coding block. Specifically, in the embodiment of the present application, the encoding control module 201 is configured to decide to select intra prediction or inter prediction.
When the intra-frame prediction mode is selected, the intra-frame prediction 203 includes obtaining reconstructed blocks of adjacent blocks coded around the current coding block as reference blocks, calculating predicted values to generate prediction blocks by adopting a prediction mode method based on pixel values of the reference blocks, subtracting corresponding pixel values of the current coding block and the prediction blocks to obtain residual errors of the current coding block, and transforming 204, quantizing 205 and entropy coding 210 the residual errors of the current coding block to form a code stream of the current coding block. Furthermore, after all the coding blocks of the current coding frame are subjected to the coding process, a part of the coding code stream of the coding frame is formed. In addition, the control and reference data generated in intra prediction 203 is also entropy encoded 210, forming part of the encoded code stream.
In particular, the transform 204 is used to remove correlation of the residuals of the image blocks in order to improve coding efficiency. For the transformation of the residual data of the current coding block, two-dimensional Discrete Cosine Transform (DCT) transformation and two-dimensional Discrete Sine Transform (DST) transformation are usually adopted, for example, at the encoding end, the residual information of the coding block is multiplied by an N × M transformation matrix and its transpose matrix, respectively, and the Transform coefficient of the current coding block is obtained after multiplication.
After the transform coefficients are generated, quantization 205 is used to further improve the compression efficiency, the transform coefficients are quantized to obtain quantized coefficients, and then entropy Coding 210 is performed on the quantized coefficients to obtain the residual code stream of the current Coding block, wherein the entropy Coding method includes, but is not limited to, Content Adaptive Binary Arithmetic Coding (CABAC) entropy Coding. And finally, storing the bit stream obtained by entropy coding and the coded coding mode information or sending the bit stream and the coded coding mode information to a decoding end. At the encoding end, the quantized result is also dequantized 206 and the dequantized result is inverse transformed 207. After the inverse transformation 207, the reconstructed pixel is obtained using the inverse transformation result and the motion compensation result. The reconstructed pixels are then filtered (i.e., loop filtered) 211. After 211, the filtered reconstructed image (belonging to the reconstructed video frame) is output. Subsequently, the reconstructed image can be used as a reference frame image of other frame images for inter-frame prediction. In the embodiment of the present application, the reconstructed image may also be referred to as a reconstructed image or a reconstructed image.
Specifically, the encoded neighboring blocks in the intra prediction 203 process are: before the current coding block is coded, the residual error generated in the coding process of the adjacent block is transformed 204, quantized 205, dequantized 206 and inverse transformed 207, and then is added to the prediction block of the adjacent block to obtain a reconstructed block. Correspondingly, inverse quantization 206 and inverse transform 207 are inverse processes of quantization 206 and transform 204, and are used to recover residual data prior to quantization and transformation.
As shown in fig. 2, when the inter prediction mode is selected, the inter prediction process includes Motion Estimation (ME) 208 and Motion Compensation (MC) 209. Specifically, the encoding end may perform Motion estimation 208 according to a reference frame image in the reconstructed video frame, and search, according to a certain matching criterion, an image block most similar to the current coding block in one or more reference frame images as a prediction block, where a relative displacement between the prediction block and the current coding block is a Motion Vector (MV) of the current coding block. And subtracting the original value of the pixel of the coding block from the corresponding pixel value of the prediction block to obtain the residual error of the coding block. The residual of the current coding block is transformed 204, quantized 205 and entropy coded 210 to form a part of the coded stream of the coded frame. For the decoding end, motion compensation 209 may be performed based on the determined motion vector and the prediction block, and a current encoding block is obtained.
As shown in fig. 2, the reconstructed video frame is a video frame obtained after being filtered 211. The reconstructed video frame includes one or more reconstructed images. The filtering 211 is used to reduce compression distortion such as blocking effect and ringing effect generated in the encoding process, the reconstructed video frame is used to provide a reference frame for inter-frame prediction in the encoding process, and the reconstructed video frame is output as a final decoded video after post-processing in the decoding process.
In particular, the inter Prediction mode may include an Advanced Motion Vector Prediction (AMVP) mode, a Merge (Merge) mode, or a skip (skip) mode.
For the AMVP mode, Motion Vector Prediction (MVP) may be determined first, after obtaining MVP, a start point of Motion estimation may be determined according to MVP, Motion search may be performed near the start point, an optimal MV may be obtained after the search is completed, a position of a reference block in a reference image is determined by the MV, a residual block is obtained by subtracting a current block from the reference block, a Motion Vector Difference (MVD) is obtained by subtracting MVP from the MV, and an index of the MVD and the MVP is transmitted to a decoding end through a code stream.
For the Merge mode, the MVP may be determined first, and directly determined as the MV of the current block. In order to obtain the MVP, a merge candidate list (MVP) may be first constructed, where the MVP candidate list may include at least one candidate MVP, each candidate MVP may correspond to an index, after selecting an MVP from the MVP candidate list, the encoding end may write the MVP index into a code stream, and then the decoding end may find, according to the index, the MVP corresponding to the index from the MVP candidate list, so as to implement decoding of the image block.
It should be understood that the above process is just one specific implementation of the Merge mode. The Merge mode may also have other implementations.
For example, Skip mode is a special case of Merge mode. After obtaining the MV according to the Merge mode, if the encoding side determines that the current block and the reference block are substantially the same, it is not necessary to transmit residual data, only the index of the MVP needs to be transmitted, and further, a flag may be transmitted, which may indicate that the current block may be directly obtained from the reference block.
That is, the Merge mode is characterized by: MV ═ MVP (MVD ═ 0); and Skip mode has one more feature, namely: the reconstructed value rec is the predicted value pred (residual value resi is 0).
For the decoding end, the operation corresponding to the encoding end is performed. Firstly, residual error information is obtained by utilizing entropy decoding, inverse quantization and inverse transformation, and whether the current image block uses intra-frame prediction or inter-frame prediction is determined according to a decoded code stream. If the prediction is intra-frame prediction, the reconstructed image block in the current frame is utilized to construct prediction information according to an intra-frame prediction method; if the inter-frame prediction is carried out, motion information needs to be analyzed, and a reference block is determined in the reconstructed image by using the analyzed motion information to obtain prediction information; and then, superposing the predicted information and the residual error information, and obtaining the reconstructed information through filtering operation.
The present application mainly relates to encoding and decoding images or videos based on neural network encoding and decoding technology (also referred to as "intelligent encoding and decoding technology"), and therefore, the following briefly introduces relevant contents of the neural network.
Neural Networks (NN), also known as "Artificial" Neural networks (ANN), are information handling systems that have certain performance characteristics in common with biological Neural networks. Neural network systems consist of many simple and highly interconnected processing components that process information by responding to dynamic states of external inputs. The processing components may be thought of as neurons in the human brain, where each perceptron accepts multiple inputs and computes a weighted sum of the inputs. In the field of neural networks, perceptrons are considered mathematical models of biological neurons. Furthermore, these interconnected processing components are typically organized in layers. For recognition applications, the external input may correspond to a pattern presented to the network that communicates with one or more intermediate layers, also referred to as "hidden layers," where the actual processing is done by a weighted "connected" system.
The ANN may use different architectures to specify which variables and their topological relationships are involved in the network. For example, variables involved in a neural network may be the weights of connections between neurons, as well as the activity of neurons. Most artificial neural networks contain some form of "learning rule" that modifies the weights of connections according to the input pattern presented. In a sense, artificial neural networks learn by example as if they were biological objects.
Deep Neural Networks (DNNs) or Deep multi-layer Neural networks correspond to Neural networks with multiple levels of interconnected nodes that allow them to represent highly non-linear and highly varying functions compactly. However, the computational complexity of DNNs is growing rapidly as the number of nodes associated with a large number of layers increases.
At present, for management and transmission of model parameters of a neural network model included in a neural network-based encoding technology, one way is to fix the model parameters of the neural network model in a common library file available to both an encoder and a decoder, and the encoder and the decoder can obtain the model parameters of the neural network model from the common library file when encoding or decoding, but this way has a reduced effect once some encoding tools of the encoder are changed.
The other mode is that the model parameters of the neural network model are transmitted to a decoding end through a code stream, so that the model parameters can be flexibly adjusted according to the requirements of an encoder, but the parameter quantity is generally larger for a deep learning neural network model with deep hierarchy teaching, if the model parameters are directly put into the code stream for transmission, the bit consumption is increased, and the video compression ratio is reduced. In addition, encoding based on neural network encoding techniques presents a greater challenge to low-latency transmission of graph-borne scenes due to the management and transmission requirements of model parameters.
The application provides an encoding method, a decoding method, an encoding device and a decoding device, which can well solve the problems of management and transmission of model parameters of a neural network model, relieve bit consumption of a code stream, and reduce the challenge of low-delay transmission due to the management and transmission requirements of the model parameters.
The encoding method 300 provided by the embodiment of the present application will be described in detail below with reference to fig. 3.
As shown in fig. 3, which is a schematic diagram of an encoding method 300 according to an embodiment of the present application, the method 300 may include steps 310 and 330.
And 310, encoding the image to be encoded by using a neural network-based encoding technology to obtain a code stream of the image to be encoded.
In the embodiment of the present application, encoding an image to be encoded by using a neural network-based encoding technique (which may also be referred to as an "intelligent encoding technique") may be understood as: and replacing or supplementing part of coding modules in the coding and decoding frame with a coding module based on the neural network model to code the image to be coded, or replacing the original coding frame with the coding frame based on the neural network model to code the image to be coded.
The neural network-based coding technology mainly comprises three aspects: hybrid neural network video coding (i.e., embedding the coding modules of the neural network into a conventional video coding framework instead of conventional coding modules), neural network rate-distortion optimization coding, and end-to-end neural network video coding.
Optionally, in some embodiments, for at least one of prediction, transformation, quantization, entropy coding, and filtering in the encoding process, a part of the encoding module in the encoding framework may be replaced with an encoding module based on a neural network model to encode the image to be encoded.
It should be noted that, for the filtering in the encoding process, a filtering module based on a neural network model may also be added to the existing filtering module to filter the reconstructed image.
Exemplarily, for intra-frame prediction in the encoding process, an intra-frame prediction method based on a neural network is adopted to compare with an original intra-frame prediction method to decide an optimal intra-frame prediction method for prediction; for inter-frame prediction in the encoding process, an image super-resolution technology based on a neural network can be adopted for prediction, so that the motion estimation performance can be improved, and the inter-frame prediction efficiency can be further improved.
For entropy coding in the coding process, a context probability estimation method based on a neural network technology can be adopted to replace a traditional rule-based context probability prediction model, and the method is applied to coefficient coding or entropy coding of some other syntax elements. For filtering in the encoding process, a Neural Network Filter (NNF) technology may be added after deblocking filtering.
The end-to-end neural network video coding is a mode of completely abandoning the traditional mixed video coding framework, directly inputs video images and related information, and adopts a prediction mode based on a neural network to carry out prediction and compression coding.
And 320, transmitting the code stream of the image to be coded through a first communication link.
Model parameters of a neural network model included in the neural network-based encoding technique are transmitted 330 over a second communication link.
Optionally, the neural network model in the embodiment of the present application may include an offline-trained neural network model or an online-trained neural network model, without limitation.
The Neural Network model in the embodiment of the present application may include DNN, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), or other Neural Network variants, which is not specifically limited in the present application.
The model parameters of the neural network model in the embodiment of the present application include, but are not limited to, the number of layers of the neural network, the number of neurons, the weight of connections between neurons, and the like.
Accordingly, for the decoding end, the code stream may be decoded based on the received model parameters of the neural network model. As shown in fig. 4, which is a schematic diagram of a decoding method 400 according to an embodiment of the present application, the method 400 may include steps 410-430.
And 410, receiving a code stream of an image to be decoded through a first communication link.
Model parameters of the neural network model are received 420 via a second communication link.
And 430, decoding the code stream by using the model parameters of the neural network model to obtain a decoded image.
Optionally, in some embodiments, for at least one of entropy decoding, inverse quantization, inverse transformation, predictive reconstruction, or filtering in the decoding process, a decoding module based on a neural network model may be used to replace a part of a decoding module in the decoding framework for decoding the image to be decoded.
It should be noted that, for the filtering in the decoding process, a filtering module based on a neural network model may also be added to the existing filtering module to filter the predicted reconstructed image.
According to the scheme provided by the application, the coding end transmits the code stream of the image to be coded and the model parameters of the neural network model through the first communication link and the second communication link respectively, and the decoding end correspondingly receives the code stream and the model parameters of the neural network model through the two communication links respectively, so that the problems of management and transmission of the model parameters of the neural network model can be well solved; the model parameters of the neural network model are transmitted through the second communication link, so that the bit consumption of the code stream is relieved; in addition, the challenge of low latency transmission due to the management of model parameters and transmission requirements may also be reduced.
Optionally, in some embodiments, the first communication link and the second communication link have different physical characteristics.
The physical characteristics in the embodiments of the present application may be determined by specific application requirements, and may include, for example, transmission delay and/or transmission bandwidth. It is to be understood that the physical characteristics may be other characteristics and should not be particularly limited to the present application.
Optionally, in some embodiments, the transmission delay of the first communication link is lower than that of the second communication link, and/or the transmission bandwidth of the second communication link is higher than that of the first communication link.
In the embodiment of the application, compared with the second communication link, the first communication link has lower transmission delay, or lower transmission bandwidth, or lower transmission delay and transmission bandwidth; in other words, the second communication link has a higher transmission delay and/or transmission bandwidth than the first communication link.
It should be noted that, during the process of encoding the image to be encoded, the encoding end may transmit the encoded code stream in real time, so that the decoding end can decode the encoded code stream in time. Therefore, it may be considered that the first communication link with a low time delay transmits the code stream of the image to be encoded.
It should be further noted that, in the process of encoding the image to be encoded by the encoding end using the neural network model, the model parameters of the neural network model may be transmitted through the second communication link, and since the model parameters of the neural network model are more, and particularly the model parameters of the neural network model with the number of layers being more complex, it may be considered that the second communication link with the higher bandwidth is used to transmit the model parameters of the neural network model.
According to the scheme provided by the application, the code stream of the image to be coded is transmitted through the first communication link with lower transmission delay, the model parameter of the neural network model is transmitted through the second communication link with higher transmission bandwidth, the problems of management and transmission of the model parameter of the neural network model can be well solved, and the challenge of low-delay transmission due to the management and transmission requirements of the model parameter of the neural network model can be further reduced.
Optionally, in some embodiments, the first communication link includes a link whose latency is less than or equal to a first threshold, and the second communication link includes a link whose bandwidth is greater than or equal to a second threshold.
The first threshold and/or the second threshold in the embodiment of the present application may be specified by a protocol, or may be configured by a server; the first threshold and/or the second threshold may be fixed values, or may be continuously adjusted and changed values; without limitation.
Optionally, in some embodiments, the first communication link comprises a link based on a private graph transfer protocol or a wireless local area network protocol, and the second communication link comprises a link based on a mobile communication protocol.
Optionally, in some embodiments, the private graph transfer protocol comprises a Software Defined Radio (SDR) protocol, the Wireless local area network protocol comprises a Wireless Fidelity (WiFi) protocol, and the mobile communication protocol comprises a 4G or 5G protocol.
It should be noted that, in this embodiment of the present application, the private graph transfer protocol shows an SDR protocol, and the private graph transfer protocol may further include other protocols, such as an Open Network Video Interface Forum (ONVIF) protocol, which is not limited.
The wireless lan protocol in the embodiment of the present application may include, in addition to the WiFi protocol shown above, other protocols, such as Bluetooth (Bluetooth), ZigBee (ZigBee), and the like, without limitation.
In addition, the mobile communication protocol in the embodiment of the present application may include other protocols, such as a future 6G protocol, in addition to the 4G or 5G protocol shown above.
Of course, in some embodiments, the first communication link may also be a link based on SDR protocol, and the second communication link may be a link based on WiFi protocol.
Optionally, in some embodiments, the first communication link comprises a private graph transmission link and the second communication link comprises a public network transmission link.
Fig. 5 is a schematic diagram of image transmission of the intelligent coding technology provided in the embodiment of the present application. After the acquisition end (which can also be understood as an encoding end) acquires a video, a code stream and corresponding model parameters of the neural network model are obtained after compression encoding of the encoder and the neural network computing platform, and then the code stream and the model parameters of the neural network model are transmitted to the display end through the wireless image transmission system. After receiving the code stream and the model parameters of the corresponding neural network model, the display end (which can also be understood as a decoding end) decodes the code stream and the model parameters of the neural network model to obtain a reconstructed video through a decoder of the display end and a neural network computing platform, and displays the reconstructed video through a display of the display end.
The wireless image transmission system can be a wireless video transmission system, and can include a private image transmission link and a public network transmission link for respectively transmitting the code stream and the model parameters of the neural network model.
The private graph transmission link in the embodiment of the application can comprise a link based on an SDR protocol or a link based on a WiFi protocol or a link based on an ONVIF protocol; the public network transmission link in the embodiment of the present application may include a link based on a 4G or 5G or 6G protocol.
According to the scheme provided by the application, the code stream of the image to be coded is transmitted through the private image transmission link, and the model parameters of the neural network model are transmitted through the public network transmission link, so that the problems of management and transmission of the model parameters of the neural network model can be well solved.
Optionally, in some embodiments, the first communication link and/or the second communication link is selected from one or more of:
a link based on a wireless local area network protocol, a link based on a mobile communication protocol, a link based on an ethernet protocol.
In the embodiment of the application, when the coding end transmits the code stream of the image to be coded and the model parameters of the neural network model, the coding end can flexibly select from one or more links, so that the flexibility can be improved.
The above indicates that the neural network model includes an offline-trained neural network model or an online-trained neural network model, and the following will take these two neural network models as examples to respectively describe the relevant contents of encoding the image to be encoded by using the neural network-based encoding technique.
The first condition is as follows: the method comprises the following steps of utilizing an encoding technology of a neural network based on-line training to encode an image in a first encoding mode:
optionally, in some embodiments, the neural network model is an online-trained neural network model, and the encoding the image to be encoded by using a neural network-based encoding technique includes:
for an nth target image, coding the nth target image by utilizing a neural network model obtained by training the nth-m target images, wherein the target image is an image obtained by dividing the image to be coded according to any one of a video sequence level, an image group level and an image level; n is an integer greater than or equal to 1, and m is an integer greater than or equal to 1.
The target image in the embodiment of the present application may be an image obtained by dividing an image to be encoded according to any one of a video sequence level, a Group of pictures (GOP) level, and a Picture level (or referred to as a frame level). The length of the GOP can be defined autonomously, and generally, pictures from a current I frame to a picture before a next I frame in a video sequence can be taken as a GOP.
To facilitate understanding of the scheme of the present application, a brief description of the GOP is first given. A GOP consists of a set of consecutive pictures, consisting of one I-frame and several B-frames and/or P-frames, which are the basic units of access by video image encoders and decoders. An I-frame (which may also be referred to as a key-frame), i.e., an intra-coded image frame, may be understood as the complete preservation of this frame of pictures; the B frame, i.e. the bidirectional reference frame or the bidirectional difference frame, can be understood as the difference between the current frame and the previous and subsequent frames recorded by the B frame, and the previous buffer image and the decoded image are acquired during decoding, and the final image is acquired by overlapping the data of the previous and subsequent images and the data of the current frame; the P frame, i.e. the forward reference frame or the forward predicted frame, may be understood as the difference between the current frame and the previous frame, and the difference defined by the current frame needs to be superimposed on the previously buffered image during decoding, so as to generate the final image.
In addition, the target image in the embodiment of the present application has a difference according to the level of dividing the image to be encoded.
If the target image is obtained by dividing the image to be encoded according to the video sequence level, encoding the nth target image by using the neural network model obtained by training the nth-m target images can be understood as follows: and taking all images to be coded contained in the video as 1 target image, and coding the target images based on a pre-trained neural network model.
If the target image is obtained by dividing the image to be encoded according to the image group level, encoding the nth target image by using the neural network model obtained by training the nth-m target images can be understood as follows: and coding each image included in the nth GOP by using a neural network model obtained by training the nth-m GOPs, wherein the GOP comprises an I frame and a plurality of B frames and/or P frames.
If the target image is obtained by dividing the image to be encoded according to the image level, encoding the nth target image by using the neural network model obtained by training the nth-m target images may be understood as follows: the nth image is encoded by a neural network model obtained by training the nth-m image, and the image can be understood as an image frame, such as an I frame, a B frame or a P frame in the above.
It should be noted that in the embodiment of the present application, m < n, that is, any target image before the nth target image may be used to encode it.
For example, for the 3 rd target image, the neural network model obtained by training the 2 nd target image may be used to encode the 3 rd target image, and meanwhile, the model parameters of the neural network model obtained by training the 2 nd target image are transmitted to the decoding end, which facilitates the decoding of the decoding end; for the 3 rd target image, the neural network model obtained by training the 1 st target image can be used for coding the 3 th target image, and the model parameters of the neural network model obtained by training the 1 st target image can be transmitted when the 2 nd target image is coded, so that in this case, the model parameters of the neural network model adopted by the 3 rd target image can not be transmitted additionally, and only the model parameters of the neural network model adopted by the decoding end need to be notified, especially for the complex neural network model with more layers, the method not only can meet the requirement of delay transmission due to more model parameters, but also can further save bandwidth.
According to the scheme provided by the application, the nth target image is coded by utilizing the neural network model obtained by training the nth-m target images, so that the coding flexibility can be improved, the method can be suitable for various scenes, and the prediction effect based on the neural network model is better for the neural network models trained in various scenes. In addition, the coding of the model parameters of the neural network model obtained by training the spaced target images can not only meet the requirement of time delay transmission due to more model parameters, but also further save the bandwidth.
As mentioned above, the neural network based encoding technique can be applied to any stage in the encoding process, and the neural network based filtering technique is exemplified below.
Optionally, in some embodiments, the encoding the nth target image by using a neural network model obtained by training the nth-m target images includes:
and filtering the nth target image by using a neural network model obtained by training the nth-m target images, wherein the nth-m target images are images of the nth-m coded images which are not filtered by the neural network model, and the nth target image is an image of the nth coded image which is not filtered by the neural network model.
Fig. 6 is a schematic diagram of an encoding framework 2 according to an embodiment of the present application.
After the encoding end obtains the reconstructed pixel by inverse quantization 206 and inverse transformation 207, the reconstructed pixel may be filtered 211. In the filtering process, the filtered reconstructed image may be output by performing any one or more of Deblocking Filtering (DF), NNF, Sample Adaptive Offset (SAO), or Adaptive Loop Filtering (ALF) on the reconstructed pixels.
Taking a target image as an example of a GOP, when a 1 st GOP of a current image to be coded is coded, the image of the 1 st GOP is taken as a training set, and a model training process based on a filtering technology of a neural network is carried out. The whole training process can refer to the following processes: in the encoding process, the current image to be encoded may be sent to the framework shown in fig. 6 for encoding training, for example, the 1 st GOP is subjected to inverse quantization and inverse transformation to obtain a reconstructed image, and DF, NNF, SAO, and ALF are performed on the reconstructed image. And after all the images in the 1 st GOP are coded, obtaining a code stream after all the images in the 1 st GOP are coded and a neural network model obtained by training a neural network frame based on all the images in the 1 st GOP.
And when the 2 nd GOP of the current image to be coded is coded, carrying out filtering technology based on the neural network by using the neural network model obtained by training the 1 st GOP. The specific filtering process may refer to the following processes: when an image in a 2 nd GOP is coded, a reconstructed image is obtained after the 2 nd GOP is subjected to inverse quantization and inverse transformation, and then the reconstructed image is sent to a filtering module which is provided with a neural network model obtained by training the 1 st GOP to obtain a filtered reconstructed image, wherein the filtering module comprises a DF module, an NNF module, an SAO module and an ALF module, and the NNF module comprises the neural network model obtained by training the 1 st GOP.
And in the same way, for the nth GOP, when the nth GOP is coded, the neural network model obtained by training the nth-m GOP is used for carrying out the filtering technology based on the neural network. The specific filtering process may refer to the following processes: when an image in an nth GOP is coded, the nth GOP is subjected to inverse quantization and inverse transformation to obtain a reconstructed image, and then the reconstructed image is sent to a filtering module which is provided with a neural network model obtained by training with the nth-m GOP to obtain a filtered reconstructed image, similarly, the filtering module here comprises a DF module, an NNF module, an SAO module and an ALF module, and the NNF module comprises the neural network model obtained by training with the nth-m GOP.
It is understood that the filtering order shown in fig. 6 is only an example, and other orders, such as DF, SAO, NNF, and ALF, are also possible, and should not be particularly limited to the present application.
M may be any positive integer less than n. For example, as for the 3 rd GOP, the neural network model obtained by training the 2 nd GOP may be used for encoding, and the neural network model obtained by training the 1 st GOP may also be used for encoding, without limitation.
According to the scheme provided by the application, the nth target image is filtered by utilizing the neural network model obtained by training the nth-m target images, so that the filtering flexibility can be improved.
Optionally, in some implementations, m is a parameter that is cured to or preset at the encoding end before encoding; or m is a parameter formed in the encoding process.
In this embodiment of the present application, m may be a parameter formed in an encoding process, for example, m is 2, and for a 3 rd GOP, a neural network model obtained by training a 1 st GOP may be used to encode the GOP; for the 4 th GOP, the neural network model obtained by training the 2 nd GOP can be used for coding the GOP; for the 5 th GOP, the neural network model obtained by training the 3 rd GOP can be used for coding the GOP; without limitation. The pictures in the previous 1 st and 2 nd GOPs can be encoded using a preset neural network model.
It should be understood that the above numerical values are only examples, and other numerical values are also possible, and should not be particularly limited in the present application.
The second method comprises the following steps:
optionally, in some embodiments, the neural network model is an online-trained neural network model, and the encoding the image to be encoded by using a neural network-based encoding technique includes:
for a first target image to be coded, coding the first target image by using a neural network model obtained by training a coded second target image, wherein the first target image and the second target image are images obtained by dividing the image to be coded according to any one of a video sequence level, an image group level and an image level;
the transmitting, via a second communication link, model parameters included in the neural network-based encoding technique includes: and transmitting model parameters of the neural network model obtained by training the second target image through the second communication link.
Optionally, in some embodiments, the images of the first target image and the second target image are separated by q target images, where q is a positive integer greater than or equal to 0.
Similarly, the first target image and the second target image in the embodiment of the present application may be images obtained by dividing an image to be encoded by any one of a video sequence level, a GOP level, and an image level.
In the embodiment of the present application, when a first target image is encoded, the first target image may be encoded by using a neural network model obtained by training an encoded second target image. Assuming that the first target image is the 2 nd GOP of the current image to be coded, the first target image can be coded by using a neural network model obtained by training the coded 1 st GOP; assuming that the first target image is a 3 rd GOP, the first target image can be coded by using a neural network model obtained by training the coded 2 nd GOP; this is not a particular limitation of the present application.
In this embodiment of the present application, q may be a positive integer greater than or equal to 0, taking a target image as a GOP as an example, if q is 0, a second target image is a previous GOP adjacent to a first target image; if q is 1, the second target image is a previous GOP separated from the first target image by one GOP; if q is 2, the second target picture is a previous GOP two GOPs apart from the first target picture.
Optionally, in some embodiments, q is a parameter fixed to or preset at the encoding end before encoding.
According to the scheme provided by the application, the encoding end encodes the first target image to be encoded by using the neural network model obtained by training the encoded second target image, so that the encoding flexibility can be improved, the method can be suitable for various scenes, and the prediction effect based on the neural network model is better for the neural network models trained in various scenes.
Correspondingly, for the decoding end, decoding the code stream of the first target image by using the received model parameters of the neural network model obtained by training the second target image so as to obtain the first target image.
Optionally, in some embodiments, the images of the first target image and the second target image are separated by q target images, where q is a positive integer greater than or equal to 0.
Optionally, in some embodiments, the q is a parameter that is fixed to or preset at a decoding end before decoding.
In the embodiment of the application, the decoding end can decode the code stream of the first target image by using the received model parameters of the neural network model obtained by training the second target image. In the process of decoding, the decoding end can determine a target image to be decoded corresponding to the model parameter of the neural network model obtained by training the second target image according to the parameter q solidified or preset at the decoding end.
Exemplarily, taking a target image as a GOP as an example, if q is 0, when a decoding end decodes a first target image, the decoding end may decode a code stream of the first target image by using a model parameter of a neural network model obtained by training a previous GOP adjacent to the first target image; if q is 1, when the decoding end decodes the first target image, the decoding end can decode the code stream of the first target image by using the model parameter of the neural network model obtained by training the GOP which is one GOP away from the previous GOP; if q is 2, when the decoding end decodes the first target image, the decoding end can decode the code stream of the first target image by using the model parameter of the neural network model obtained by training the previous GOP which is two GOPs away from the first GOP.
According to the scheme, the decoding end decodes the code stream of the first target image by using the received model parameters of the neural network model obtained by training the second target image, so that the flexibility can be improved, the method can be suitable for various scenes, and the prediction reconstruction effect based on the neural network model is better for the neural network model trained in various scenes.
Case two: encoding an image to be encoded by using an encoding technique based on an offline-training neural network
Optionally, in some embodiments, the neural network model is an offline-trained neural network model, and the encoding the image to be encoded by using a neural network-based encoding technique includes:
and for the p target image, encoding the p target image by using the offline-trained neural network model, wherein the target image is an image obtained by dividing the image to be encoded according to any one of a video sequence level, an image group level and an image level, and p is an integer greater than or equal to 0.
In this embodiment of the application, encoding the pth target image by using the offline-trained neural network model may include: and performing at least one of prediction, transformation, quantization, entropy coding and filtering on the p < th > target image by using the offline trained neural network model.
In the following, a filtering technique based on a neural network is taken as an example for explanation, and assuming that the target image is an image obtained by dividing an image to be encoded according to a GOP level, a neural network model based on the filtering technique of the neural network may be trained based on a training set capable of covering most video scenes, so as to obtain a trained neural network model. The acquisition end deploys the trained neural network model to a neural network computing platform and codes the p-th GOP by combining an encoder to obtain a code stream, and then the code stream and the model parameters of the neural network model are transmitted to the display end through a wireless image transmission system (adopting a double communication link mode for transmission). After receiving the model parameters and the code stream of the neural network model, the display terminal deploys the neural network model based on the model parameters to a neural network computing platform of the display terminal, and decodes the neural network model based on the model parameters by combining with a decoder to obtain a reconstructed video and then displays the reconstructed video.
According to the scheme provided by the application, the coding end codes the p-th target image by using the offline-trained neural network model, and transmits the code stream and the model parameters of the neural network model through the first communication link and the second communication link respectively, so that the problems of management and transmission of the model parameters of the neural network model can be well solved.
As described above, after the image to be encoded is encoded by using the neural network-based encoding technique, the model parameters of the neural network model included in the neural network-based encoding technique may be transmitted through the second communication link. For the online-trained neural network model, because the model parameters of the neural network model are continuously updated along with the continuous encoding, the model parameters of a plurality of neural network models are transmitted to a decoding end, and the decoding end needs to determine the model parameters of the neural network model corresponding to the code stream to be decoded; for the offline-trained neural network model, the encoding end may train multiple neural network models when training the neural network model, and the encoding end may encode different target images with different neural network models, so that the encoding end may transmit model parameters of multiple neural network models to the decoding end, and the decoding end also needs to determine the model parameters of the neural network model corresponding to the code stream to be decoded. The following will describe the relevant content of the model parameter of the neural network model corresponding to the code stream to be decoded determined by the decoding end.
Optionally, in some embodiments, the code stream of the image to be encoded further includes first indication information, where the first indication information is used to indicate an identifier of a model parameter of a neural network model used when the image to be encoded is encoded.
The first condition is as follows: encoding with online trained neural network models
It is noted above that for the nth GOP, the neural network-based filtering technique is performed with the neural network model trained with the nth-m GOP when the nth GOP is encoded. The specific filtering process may refer to the following processes: when the image in the nth GOP is coded, the image in the nth-m GOP is sent to the HM platform to be coded to obtain a reconstructed image, then the reconstructed image is sent to a filtering module which is provided with a neural network model obtained by training with the nth-m GOP to obtain a filtered reconstructed image, and then the reconstructed image is subjected to subsequent SAO and ALF processes to obtain a final reconstructed image.
In the embodiment of the application, when the encoding end encodes the 1 st GOP, the code stream corresponding to the encoding end may include first indication information indicating the identifier of the model parameter of the neural network model adopted when the 1 st GOP is encoded; when the encoding end encodes the 2 nd GOP, the corresponding code stream of the encoding end can comprise first indication information which indicates the identifier of the model parameter of the neural network model adopted when the 2 nd GOP is encoded; in this way, when the encoding end encodes the nth GOP, the code stream corresponding to the encoding end may include first indication information indicating the identifier of the model parameter of the neural network model adopted when the nth GOP is encoded.
After receiving the code stream and the model parameters of the neural network model, the decoding end can identify the identifier of the model parameters of the neural network model corresponding to the code stream of the image to be decoded according to the first indication information in the code stream, can determine the model parameters of the neural network model adopted by the code stream of the image to be decoded during encoding based on the identifier, and decodes the code stream according to the model parameters of the neural network model, thereby obtaining the final reconstructed image.
Case two: encoding with offline trained neural network models
In the embodiment of the application, when the encoding end trains the neural network model based on the filtering technology of the neural network, it is possible to train a plurality of neural network models, for example, when the encoding end filters different target images in an image to be encoded, different neural network models can be used for filtering.
Exemplarily, it is assumed that the encoding end trains 3 neural network models, namely, a neural network model 1, a neural network model 2, and a neural network model 3; when encoding the 1 st GOP, filtering the GOP by adopting a neural network model 1, wherein the corresponding code stream can comprise first indication information indicating the identifier of the model parameter of the neural network model adopted when encoding the 1 st GOP; when encoding the 2 nd GOP, encoding the GOP by using a neural network model 2, wherein the corresponding code stream can comprise first indication information indicating the identifier of the model parameter of the neural network model used when encoding the 2 nd GOP; when the 3 rd GOP is encoded, the neural network model 3 is adopted to filter the GOP, and the corresponding code stream may include first indication information indicating the identifier of the model parameter of the neural network model adopted when the 3 rd GOP is encoded.
Correspondingly, after receiving the code stream and the model parameters of the neural network model, the decoding end can identify the identifier of the model parameters of the neural network model corresponding to the code stream of the image to be decoded according to the first indication information in the code stream, can determine the model parameters of the neural network model adopted by the code stream of the image to be decoded during encoding based on the identifier, and decodes the code stream according to the model parameters of the neural network model, thereby obtaining the final reconstructed image.
It should be noted that, in the process of filtering the target image, the encoding end may filter the target image by using a trained arbitrary neural network model. For example, when encoding the 1 st GOP and the 2 nd GOP, they can be filtered by using the neural network model 1; when encoding the 3 rd GOP, it can be filtered by using the neural network model 3; without limitation.
According to the scheme provided by the application, the first indication information is added in the code stream to indicate the identification of the model parameter of the neural network model adopted by the encoding end when the image to be encoded is encoded, so that the decoding end can conveniently determine the model parameter corresponding to the image to be decoded when decoding, and the decoding accuracy can be improved.
Optionally, in some embodiments, if the image to be encoded is a key frame, the code stream of the image to be encoded further includes first indication information, where the first indication information is used to indicate an identifier of a model parameter of a neural network model used when the image to be encoded is encoded. If the image to be encoded is a non-key frame, the code stream may not include the first indication information.
Optionally, in some embodiments, the first indication information is further used for indicating identification of model parameters of the neural network model adopted when encoding other frames from the current key frame to the next key frame.
The key frame in the embodiment of the present application is the above I frame, and it can be understood that if the current frame to be encoded is the I frame, the encoding end may include first indication information in a code stream after encoding the current I frame, the first indication information indicating an identifier of a model parameter of a neural network model adopted when encoding the current I frame, so as to facilitate correct decoding of the decoding end.
In addition, if the current frame to be encoded is an I frame and the subsequent frame to be encoded is a B frame and/or a P frame, the first indication information may indicate, in addition to the identifier of the model parameter of the neural network model used in encoding the current I frame, an identifier of the model parameter of the neural network model used in encoding other frames (e.g., B frame and/or P frame) between the current I frame and the next I frame.
In other words, in the process of encoding the current I frame, the encoding end adds the first indication information in the encoded code stream of the current I frame, and the first indication information may be used to indicate the identification of the model parameters of the neural network model used when encoding the current I frame and other frames (such as B frames and/or P frames) between the current I frame and the next I frame.
For the decoding end, when the decoding end decodes an I frame, if the frame adjacent to the I frame is a B frame and/or a P frame, the decoding end may continue to decode the B frame and/or the P frame adjacent to the I frame by using the model parameter when the I frame is decoded until the next I frame is decoded.
According to the scheme provided by the application, the first indication information included in the code stream indicates the identifier of the model parameter of the neural network model adopted when the current I frame and other frames between the current I frame and the next I frame are coded, so that the decoding end can conveniently determine the model parameter corresponding to the image to be decoded when decoding, the decoding accuracy can be improved, in addition, the first indication information can be carried in the I frame only, the first indication information is not carried in the B frame or the P frame, and the bit consumption can be further reduced.
Optionally, in some embodiments, the transmitting, through the second communication link, the model parameters of the neural network model included in the neural network-based encoding technique includes:
and transmitting the model parameters of the neural network model and the identification corresponding to the model parameters of the neural network model through the second communication link.
Correspondingly, for the decoding end, the model parameters of the neural network model and the identification corresponding to the model parameters of the neural network model are received through the second communication link.
In this embodiment of the application, the encoding end may also transmit the model parameter of the neural network model and the identifier corresponding to the model parameter of the neural network model through the second communication link, the decoding end receives the model parameter of the neural network model and the identifier corresponding to the model parameter of the neural network model through the second communication link, determines the model parameter of the neural network model corresponding to decoding the code stream through the identifier, and decodes the code stream based on the determined model parameter of the neural network model.
In some embodiments, for the encoding technology based on the neural network, an available neural network model may be obtained based on the existing video scene training set through offline training, and the available neural network model is used as a basic neural network model implemented by the encoding end.
In the actual application process, the neural network model can be retrained based on different application scenes, and the coding end can select to use the retrained neural network model or the existing basic neural network model for coding in the coding process, and writes whether to use the syntax element of the retrained neural network model for coding into the code stream.
When the coding end selects the retrained neural network model for coding, the model parameters of the retrained neural network model can be transmitted through the second communication link, and syntax elements using the retrained neural network model are coded and written into a code stream; when the encoding end selects the existing basic neural network model for encoding, the model parameters of the retrained neural network model do not need to be transmitted to the decoding end through the second communication link, and furthermore, syntax elements of the neural network model which does not use retraining can be encoded and written into the code stream.
And the decoding end decodes the identification of whether to use the retrained neural network model from the received code stream, and adopts the existing basic neural network model for decoding or adopts the retrained neural network model transmitted through the second communication link for decoding.
In one implementation, it is assumed that the encoding end obtains a usable neural network model 1 through offline training of an existing video scene training set. When the encoding end encodes the 1 st GOP of the image to be encoded, the encoding end determines that the neural network model 1 cannot be used for encoding the 1 st GOP, and then the 1 st GOP can be used as a training set to perform a training process of the neural network model based on the encoding technology of the neural network, so as to obtain a code stream obtained by encoding all images in the 1 st GOP and the neural network model obtained by training the neural network framework based on all images in the 1 st GOP. In addition, the model parameters of the neural network model can be transmitted to the decoding end through the second communication link, and the syntax elements using the retrained neural network model can be written in the code stream. For other GOPs of the image to be coded, similar methods can be used for coding, and further description is omitted here.
In another implementation, it is assumed that the encoding end obtains a usable neural network model 1 through offline training of an existing video scene training set. When the encoding end encodes the 1 st GOP of the image to be encoded, the neural network model 1 is determined to be used for encoding the 1 st GOP, then the 1 st GOP can be encoded by using the neural network model 1, a code stream obtained by encoding all images in the 1 st GOP is obtained, and syntax elements of the neural network model which is not used for retraining are written in the code stream. The model parameters of the neural network model 1 may be preset at the decoding end, or may be transmitted to the decoding end through the second communication link, without limitation. For other GOPs of the image to be coded, similar methods can be used for coding, and further description is omitted here.
According to the scheme provided by the application, the encoding end can select to use the existing basic neural network model for encoding, and also can select to use the retrained neural network model for encoding, so that the encoding flexibility can be improved.
Optionally, in some embodiments, the codestream further includes second indication information, where the second indication information is used to indicate that the neural network model is trained based on one of a group of pictures, a frame, or a sequence.
In this embodiment, the code stream transmitted through the first communication link may include second indication information indicating that the neural network model is trained based on one of a group of pictures, a frame, or a sequence.
After receiving the code stream transmitted by the encoding end, the decoding end can determine a training mode of the neural network model through second indication information in the code stream, determine the neural network model by combining the training mode of the neural network model and the received model parameters of the neural network model, and decode the code stream based on the neural network model.
As indicated above, the encoding end transmits the model parameters of the neural network model through the second communication link, and in some implementations, the model parameters of the neural network model may be processed and the processed model parameters may be transmitted to the decoding end, as described in detail below.
Optionally, in some embodiments, the method 300 further comprises:
converting the model parameters of the neural network model into a target format;
compressing the model parameters of the target format to obtain compressed model parameters;
the transmitting, via a second communication link, model parameters of a neural network model included in the neural network-based encoding technique includes: transmitting the compressed model parameters over the second communication link.
Accordingly, for the decoding end, the receiving the model parameters of the neural network model through the second communication link in step 420 includes: receiving the compressed model parameters over the second communication link;
the decoding the code stream by using the model parameter of the neural network model in step 430 includes: decompressing the compressed model parameters to obtain a target format; converting the target format; and decoding the code stream by using the model parameters of the converted format.
In the embodiment of the application, after the encoding end encodes the image to be encoded by using the neural network model, format conversion can be performed on the model parameters of the neural network model to obtain a target format capable of being compressed, the compressed model parameters are obtained by compressing the target format, and the compressed model parameters are transmitted to the decoding end through the second communication link. After receiving the compressed model parameters, the decoding end may decompress the compressed model parameters to obtain a target format, then convert the target format to obtain model parameters of a neural network model corresponding to the code stream, and decode the code stream by using the model parameters of the neural network model.
Optionally, in some embodiments, the target Format includes a Neural Network Exchange Format (NNEF) or an Open Neural Network Exchange (Open Neural Network Exchange) ONNX Format.
The NNEF and ONNX formats in the embodiment of the application are two similar open formats and are used for representing and exchanging a neural network between a deep learning framework and an inference engine. The core of both formats is based on a set of common operations from which the network can be constructed.
The NNEF reduces the dispersion of machine learning deployments by supporting a variety of devices and applications on the platform using a rich combination of neural network training tools and inference engines, with the primary goal of being able to export networks from the deep learning framework and import them into the hardware vendor's inference engine.
ONNX defines a set of common operator-machine learning and deep learning model building blocks and a common file format to allow Artificial Intelligence (AI) developers to use the model with various frameworks, tools, runtimes and encoders and decoders.
The two formats can store the neural network model generated by the common deep learning framework, and the purpose of the two formats is to realize the interaction and the universality of the neural network model among the different deep learning frameworks.
According to the scheme provided by the application, the interaction and the universality of the neural network model among different depth learning frames can be realized by converting the model parameters of the neural network model into the target format; in addition, the target format obtained after conversion is compressed and the compressed model parameters are transmitted, so that the bandwidth can be further saved.
Optionally, in some embodiments, the compressing the model parameters of the target format includes: compressing the model parameters of the target format by using a Neural Network Representation (NNR) compression method of a Moving Pictures Experts Group (MPEG) to obtain a code stream of the NNR;
the transmitting the compressed model parameters over the second communication link includes: and transmitting the code stream of the NNR through the second communication link.
Correspondingly, for a decoding end, receiving the code stream of the NNR through the second communication link; and decompressing the code stream of the NNR.
In the embodiment of the application, NNR is a method for representing and compressing a neural network model in a manner similar to video compression coding. Neural network model parameters are compressed into an NNR bit stream (bitstream) composed of a plurality of NNR units by adopting the modes of weight sparsification, network parameter pruning, quantization, low-rank approximation, predictive coding, entropy coding and the like.
In the embodiment of the present application, the training of the neural network model may be performed in the following manner.
Fig. 7 is a schematic flowchart of a process for training a neural network model according to an embodiment of the present disclosure. Referring to fig. 7, the image to be encoded is compressed and encoded through the encoder and the neural network computing platform to obtain the neural network model, the model parameters of the neural network model are converted into the format of the NNEF or the format of the ONNX, and then the model parameters of the format of the NNEF or the format of the ONNX obtained after the conversion are compressed through the NNR compression method of the MPEG, so as to obtain the code stream of the NNR.
When the encoding end encodes the image to be encoded by using the encoding technique based on the neural network, the encoding end may encode the image based on the process shown in fig. 8 a.
Fig. 8a is a schematic flowchart illustrating a video encoder applying an intelligent encoding technique according to an embodiment of the present application. Referring to fig. 8a, after obtaining the code stream of the NNR, the NNR of the MPEG may be obtained through decompression, the NNR of the MPEG obtains the model parameter of the neural network model after format reverse conversion in the NNEF or ONNX format, and deploys the obtained model parameter of the neural network model in the neural network computing platform, and when the encoder encodes an image or a video, the encoder may encode the image or the video by combining the neural network computing platform on which the model parameter of the neural network model is deployed, and generate an encoded code stream.
In particular implementations, encoding by neural network-based encoding techniques may be implemented by modifying conventional encoders. For example, a syntax element may be added to a Sequence Parameter Set (SPS), a Picture Parameter Set (PPS), or a Slice Header to identify a switch for controlling the on or off of a neural network-based coding technique, and the syntax element may be directly added to the syntax element Parameter Set or may be optionally added to User Extension Data (User Extension Data) of the syntax element Parameter Set.
For the decoding side, decoding may be performed based on the procedure shown in fig. 8 b.
Fig. 8b is a schematic flowchart illustrating a video decoder applying an intelligent coding technique according to an embodiment of the present application. Referring to fig. 8b, after receiving a code stream of the NNR, the NNR of the MPEG may be obtained through decompression, the NNR of the MPEG obtains model parameters of the neural network model after format reverse conversion of the NNEF or ONNX format, and deploys the obtained model parameters of the neural network model on the neural network computing platform, and when the decoder decodes the code stream of the image or video to be decoded, the decoder may decode the code stream by combining the neural network computing platform on which the model parameters of the neural network model are deployed, so as to obtain the image or video to be decoded.
Optionally, in some embodiments, the compressing the model parameters of the target format includes: compressing the model parameters of the target format by using a compression method of an Artificial Intelligence Industry Technology Innovation strategy Alliance (AITISA) to obtain compressed data;
the transmitting the compressed model parameters over the second communication link includes: transmitting the compressed data over the second communication link.
Accordingly, for the decoding end, receiving the compressed data through the second communication link; and decompresses the compressed data.
In the embodiment of the application, the encoding end compresses the model parameters of the target format by using an AITISA compression method, and transmits compressed data obtained after compression through a second communication link; the decoder receives the compressed data and decompresses it. The specific implementation process is similar to the above-mentioned compression encoding and decoding process using MPEG NNR, and is not described herein again.
The compressed data in the embodiment of the present application may also be referred to as a compressed code stream, which is not limited.
Optionally, in some embodiments, the code stream further includes third indication information, where the third indication information is used to indicate whether encoding by using an encoding technique of a neural network is started.
In this embodiment of the application, the code stream transmitted through the first communication link may further include third indication information indicating whether the encoding end starts encoding by using an encoding technology of a neural network during encoding. After receiving the code stream, the decoding end can determine whether to decode by using a decoding technology of the neural network according to the third indication information in the code stream.
If the third indication information indicates that encoding by using the encoding technology of the neural network is started, the decoding end decodes by using the decoding technology of the neural network; if the third indication information indicates that the encoding technology using the neural network is not started, the decoding end does not use the decoding technology of the neural network for decoding.
Exemplarily, assuming that "1" and "0" respectively represent the on and off of encoding by using the encoding technology of the neural network, if the third indication information in the code stream indicates "1", the decoding end determines to decode by using the decoding technology of the neural network after receiving the indication information; and if the third indication information in the code stream indicates '0', the decoding end determines not to use the decoding technology of the neural network for decoding after receiving the indication information.
It should be understood that the above-mentioned references of "1" and "0" to the turning on and off of the coding by the coding technique of the neural network are only exemplary, and other references (such as "a", "b", etc.) may also be used to indicate the turning on and off of the coding by the coding technique of the neural network, and should not be particularly limited in this application.
According to the scheme provided by the application, the third indication information is added in the code stream to indicate whether the coding technology using the neural network is started for coding, and the decoding end determines whether the decoding technology using the neural network is used for decoding or not according to the third indication information in the code stream after receiving the code stream, so that the accuracy of decoding can be further improved.
Optionally, in some embodiments, the transmitting, through the second communication link, the model parameters of the neural network model included in the neural network-based encoding technique includes:
transmitting part of or all of the model parameters of the neural network model through the second communication link.
Accordingly, for the decoding end, part of the model parameters or all the model parameters of the neural network model are received through the second communication link.
In this embodiment, the model parameters of the neural network model transmitted through the second communication link may be part of or all of the model parameters of the neural network model. Wherein the part of the model parameters may include, but is not limited to, the number of layers of the neural network, the number of neurons, and other model parameters (including, but not limited to, the weight of the connection between the neurons, etc.) may be preset at the encoding end and the decoding end.
After the decoding end receives part of the model parameters of the neural network model through the second communication link, the decoding end can decode the code stream of the image to be decoded by combining with other model parameters preset at the decoding end; or, all the model parameters of the neural network model are received through the second communication link to decode the code stream of the image to be decoded.
According to the scheme provided by the application, the coding end transmits part of the model parameters of the neural network model through the second communication link, other model parameters are preset at the coding end and the decoding end, so that the transmission bandwidth can be saved, and the decoding accuracy can be ensured when the decoding end decodes the code stream; the coding end transmits all model parameters of the neural network model through the second communication link, and the decoding accuracy can be ensured when the decoding end decodes the code stream.
The method embodiment of the present application is described in detail above with reference to fig. 1 to 8b, and the apparatus embodiment of the present application is described below with reference to fig. 9 to 13, and the apparatus embodiment and the method embodiment correspond to each other, so that the parts not described in detail can be referred to the method embodiments of the previous parts.
Fig. 9 is an encoding apparatus 900 according to an embodiment of the present application, where the encoding apparatus 900 may include a processor 910.
A processor 910, configured to encode an image to be encoded by using a coding technique based on a neural network, so as to obtain a code stream of the image to be encoded;
transmitting the code stream of the image to be coded through a first communication link;
model parameters of a neural network model included in the neural network-based encoding technique are transmitted over a second communication link.
The first communication link may be provided by the first communication module and the second communication link may be provided by the second image module. For example, the first communication module is an SDR module/wifi module, and the second communication module is a 4G/5G module.
Optionally, in some embodiments, the first communication link and the second communication link have different physical characteristics.
Optionally, in some embodiments, the transmission delay of the first communication link is lower than that of the second communication link, and/or the transmission bandwidth of the second communication link is higher than that of the first communication link.
Optionally, in some embodiments, the first communication link includes a link whose latency is less than or equal to a first threshold, and the second communication link includes a link whose bandwidth is greater than or equal to a second threshold.
Optionally, in some embodiments, the first communication link comprises a link based on a private graph transfer protocol or a wireless local area network protocol, and the second communication link comprises a link based on a mobile communication protocol.
Optionally, in some embodiments, the private mapping protocol comprises a software radio SDR protocol, the wireless local area network protocol comprises a wireless fidelity WiFi protocol, and the mobile communication protocol comprises a 4G or 5G protocol.
Optionally, in some embodiments, the first communication link comprises a private graph transmission link, and the second communication link comprises a public network transmission link.
Optionally, in some embodiments, the neural network model comprises an offline trained neural network model or an online trained neural network model.
Optionally, in some embodiments, the neural network model is an online trained neural network model, and the processor 910 is further configured to:
for an nth target image, coding the nth target image by utilizing a neural network model obtained by training the nth-m target images, wherein the target image is an image obtained by dividing the image to be coded according to any one of a video sequence level, an image group level and an image level; n is an integer greater than or equal to 1, and m is an integer greater than or equal to 1.
Optionally, in some embodiments, the processor 910 is further configured to:
and filtering the nth target image by using a neural network model obtained by training the nth-m target images, wherein the nth-m target images are images of the nth-m coded images which are not filtered by the neural network model, and the nth target image is an image of the nth coded image which is not filtered by the neural network model.
Optionally, in some embodiments, the m is a parameter cured to or preset at the encoding end before encoding; or m is a parameter formed in the encoding process.
Optionally, in some embodiments, the neural network model is an online trained neural network model, and the processor 910 is further configured to:
for a first target image to be coded, coding the first target image by using a neural network model obtained by training a coded second target image, wherein the first target image and the second target image are images obtained by dividing the image to be coded according to any one of a video sequence level, an image group level and an image level;
and transmitting model parameters of the neural network model obtained by training the second target image through the second communication link.
Optionally, in some embodiments, the images of the first target image and the second target image are separated by q target images, where q is a positive integer greater than or equal to 0.
Optionally, in some embodiments, the q is a parameter that is fixed to or preset at the encoding end before encoding.
Optionally, in some embodiments, the neural network model is an offline trained neural network model, and the processor 910 is further configured to:
and for the p target image, encoding the p target image by using the offline-trained neural network model, wherein the target image is an image obtained by dividing the image to be encoded according to any one of a video sequence level, an image group level and an image level, and p is an integer greater than or equal to 0.
Optionally, in some embodiments, the code stream of the image to be encoded further includes first indication information, where the first indication information is used to indicate an identifier of a model parameter of a neural network model used when the image to be encoded is encoded.
Optionally, in some embodiments, if the image to be encoded is a key frame, the code stream of the image to be encoded further includes first indication information, where the first indication information is used to indicate an identifier of a model parameter of a neural network model used when the image to be encoded is encoded.
Optionally, in some embodiments, the first indication information is further used for indicating identification of model parameters of the neural network model adopted when encoding other frames from the current key frame to the next key frame.
Optionally, in some embodiments, the processor 910 is further configured to:
and transmitting the model parameters of the neural network model and the identification corresponding to the model parameters of the neural network model through the second communication link.
Optionally, in some embodiments, the code stream of the image to be encoded further includes second indication information, where the second indication information is used to indicate that the neural network model is trained based on one of an image group, a frame, or a sequence.
Optionally, in some embodiments, the processor 910 is further configured to:
converting the model parameters of the neural network model into a target format;
compressing the model parameters of the target format to obtain compressed model parameters;
transmitting the compressed model parameters over the second communication link.
Optionally, in some embodiments, the target format comprises a neural network transform format NNEF or an open neural network transform, ONNX, format.
Optionally, in some embodiments, the processor 910 is further configured to:
compressing the model parameters of the target format by using a neural network expression NNR compression method of a dynamic image expert group (MPEG) to obtain a code stream of NNR;
and transmitting the code stream of the NNR through the second communication link.
Optionally, in some embodiments, the processor 910 is further configured to:
compressing the model parameters of the target format by using a compression method of the AITISA (Artificial Intelligence alliance of technical innovation strategy) to obtain compressed data;
transmitting the compressed data over the second communication link.
Optionally, in some embodiments, the processor 910 is further configured to:
utilizing the neural network based encoding technique to at least one of predict, transform, quantize, entropy encode, or filter the image to be encoded.
Optionally, in some embodiments, the code stream of the image to be encoded further includes third indication information, where the third indication information is used to indicate whether encoding using an encoding technique of a neural network is started.
Optionally, in some embodiments, the processor 910 is further configured to:
transmitting part of or all of the model parameters of the neural network model through the second communication link.
Optionally, in some embodiments, the first communication link and/or the second communication link is selected from one or more of:
a link based on a wireless local area network protocol, a link based on a mobile communication protocol, a link based on an ethernet protocol.
Fig. 10 is a decoding apparatus 1000 according to an embodiment of the present application, where the decoding apparatus 1000 may include a processor 1010.
A processor 1010 configured to receive a code stream of an image to be decoded through a first communication link;
receiving model parameters of the neural network model over a second communication link;
and decoding the code stream by using the model parameters of the neural network model to obtain a decoded image.
Optionally, in some embodiments, the first communication link and the second communication link have different physical characteristics.
Optionally, in some embodiments, the transmission delay of the first communication link is lower than that of the second communication link, and/or the transmission bandwidth of the second communication link is higher than that of the first communication link.
Optionally, in some embodiments, the first communication link includes a link whose latency is less than or equal to a first threshold, and the second communication link includes a link whose bandwidth is greater than or equal to a second threshold.
Optionally, in some embodiments, the first communication link comprises a link based on a private graph transfer protocol or a wireless local area network protocol, and the second communication link comprises a link based on a mobile communication protocol.
Optionally, in some embodiments, the private mapping protocol comprises a software radio SDR protocol, the wireless local area network protocol comprises a wireless fidelity WiFi protocol, and the mobile communication protocol comprises a 4G or 5G protocol.
Optionally, in some embodiments, the first communication link comprises a private graph transmission link and the second communication link comprises a public network transmission link.
Optionally, in some embodiments, the neural network model comprises an offline trained neural network model or an online trained neural network model.
Optionally, in some embodiments, the code stream further includes first indication information, where the first indication information is used to indicate an identifier of a model parameter of a neural network model adopted by a coding end when an image to be coded is coded;
the processor 1010 is further configured to:
and decoding the code stream by using the model parameter of the neural network model corresponding to the identifier of the model parameter.
Optionally, in some embodiments, if the image to be decoded in the code stream is a key frame, the code stream further includes first indication information, where the first indication information is used to indicate an identifier of a model parameter of a neural network model adopted by an encoding end when the image to be encoded is encoded;
the processor 1010 is further configured to:
and decoding the code stream by using the model parameters of the neural network model and the identification of the model parameters.
Optionally, in some embodiments, the first indication information is further used for indicating identification of model parameters of the neural network model adopted when decoding other frames from the current key frame to the next key frame.
Optionally, in some embodiments, the neural network model is an online trained neural network model, and the processor 1010 is further configured to:
decoding a code stream of a first target image by using a received model parameter of a neural network model obtained by training a second target image to obtain the first target image, wherein the first target image and the second target image are images to be decoded, and the images are divided according to any one of a video sequence level, an image group level and an image level.
Optionally, in some embodiments, the images of the first target image and the second target image are separated by q target images, where q is a positive integer greater than or equal to 0.
Optionally, in some embodiments, the q is a parameter that is fixed to or preset at a decoding end before decoding.
Optionally, in some embodiments, the processor 1010 is further configured to:
and receiving the model parameters of the neural network model and the identification corresponding to the model parameters of the neural network model through the second communication link.
Optionally, in some embodiments, the codestream further includes second indication information, where the second indication information is used to indicate that the neural network model is trained based on one of a group of pictures, a frame, or a sequence.
Optionally, in some embodiments, the processor 1010 is further configured to:
receiving the compressed model parameters over the second communication link;
decompressing the compressed model parameters to obtain a target format;
converting the target format;
and decoding the code stream by using the model parameters of the converted format.
Optionally, in some embodiments, the target format comprises a neural network transform format NNEF or an open neural network transform, ONNX, format.
Optionally, in some embodiments, the processor 1010 is further configured to:
receiving a code stream of a neural network expression NNR through the second communication link;
and decompressing the code stream of the NNR.
Optionally, in some embodiments, the processor 1010 is further configured to:
receiving compressed data over the second communication link;
decompressing the compressed data.
Optionally, in some embodiments, the processor 1010 is further configured to:
and performing at least one of entropy decoding, inverse quantization, inverse transformation, prediction reconstruction or filtering on the code stream by using the model parameters of the neural network.
Optionally, in some embodiments, the code stream further includes third indication information, where the third indication information is used to indicate whether encoding by using an encoding technique of a neural network is started.
Optionally, in some embodiments, the processor 1010 is further configured to:
receiving a portion of model parameters or all of the model parameters of the neural network model over the second communication link.
Optionally, in some embodiments, the first communication link and/or the second communication link is selected from one or more of:
a link based on a wireless local area network protocol, a link based on a mobile communication protocol, a link based on an ethernet protocol.
Fig. 11 is a schematic structural diagram of an encoding device according to still another embodiment of the present application. The encoding apparatus 1100 shown in fig. 11 includes a processor 1110, and the processor 1110 can call and run a computer program from a memory to implement the encoding method described in the embodiment of the present application.
Optionally, as shown in fig. 11, the encoding apparatus 1100 may further include a memory 1120. From the memory 1120, the processor 1110 can call and run a computer program to implement the encoding method in the embodiment of the present application.
The memory 1120 may be a separate device from the processor 1110, or may be integrated into the processor 1110.
Optionally, as shown in fig. 11, the encoding apparatus 1100 may further include a transceiver 1130, and the processor 1110 may control the transceiver 1130 to communicate with other apparatuses, and specifically, may transmit information or data to the other apparatuses or receive information or data transmitted by the other apparatuses.
Optionally, the encoding device may be, for example, an encoder and a terminal (including but not limited to a mobile phone, a camera, an unmanned aerial vehicle, etc.), and the encoding device may implement corresponding processes in each encoding method in the embodiments of the present application, and for brevity, details are not described here again.
Fig. 12 is a schematic structural diagram of a decoding device according to still another embodiment of the present application. The decoding apparatus 1200 shown in fig. 12 includes a processor 1210, and the processor 1210 can call and run a computer program from a memory to implement the decoding method described in the embodiment of the present application.
Optionally, as shown in fig. 12, the decoding apparatus 1200 may further include a memory 1220. The processor 1210 may call and run a computer program from the memory 1220 to implement the decoding method in the embodiment of the present application.
The memory 1220 may be a separate device from the processor 1210, or may be integrated into the processor 1210.
Optionally, as shown in fig. 12, the decoding apparatus 1200 may further include a transceiver 1230, and the processor 1210 may control the transceiver 1230 to communicate with other apparatuses, and specifically, may transmit information or data to or receive information or data transmitted by other apparatuses.
Optionally, the decoding device may be, for example, a decoder, a terminal (including but not limited to a mobile phone, a camera, an unmanned aerial vehicle, etc.), and the decoding device may implement corresponding processes in each decoding method in the embodiments of the present application, and for brevity, details are not described here again.
Fig. 13 is a schematic structural diagram of a chip of an embodiment of the present application. The chip 1300 shown in fig. 13 includes a processor 1310, and the processor 1310 may call and run a computer program from a memory to implement the encoding method or the decoding method in the embodiment of the present application.
Optionally, as shown in fig. 13, the chip 1300 may further include a memory 1320. The processor 1310 may call and run a computer program from the memory 1320 to implement the encoding method or the decoding method in the embodiment of the present application.
The memory 1320 may be a separate device from the processor 1310, or may be integrated into the processor 1310.
Optionally, the chip 1300 may further include an input interface 1330. The processor 1310 may control the input interface 1330 to communicate with other devices or chips, and in particular, may obtain information or data transmitted by other devices or chips.
Optionally, the chip 1300 may further include an output interface 1340. The processor 1310 may control the output interface 1340 to communicate with other devices or chips, and in particular, may output information or data to the other devices or chips.
It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as a system-on-chip, a system-on-chip or a system-on-chip, etc.
It should be understood that the processor of the embodiments of the present application may be an integrated circuit image processing system having signal processing capabilities. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
It will be appreciated that the memory in the embodiments of the subject application can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (DDR SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous link SDRAM (SLDRAM), and Direct Rambus RAM (DR RAM). It should be noted that the memory of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
The memory in embodiments of the present application may provide instructions and data to the processor. The portion of memory may also include non-volatile random access memory. For example, the memory may also store device type information. The processor may be configured to execute the instructions stored in the memory, and when the processor executes the instructions, the processor may perform the steps corresponding to the terminal device in the above method embodiment.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in a processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor executes instructions in the memory, in combination with hardware thereof, to perform the steps of the above-described method. To avoid repetition, it is not described in detail here.
It should also be understood that, in the embodiment of the present application, the pixel points in the image may be located in different rows and/or columns, where the length of a may correspond to the number of pixel points included in a, which are located in the same row, and the height of a may correspond to the number of pixel points included in a, which are located in the same column. In addition, the length and the height of a may also be referred to as the width and the depth of a, respectively, which is not limited in this application.
It should also be understood that, in this embodiment of the present application, the "boundary spaced from a" may refer to at least one pixel point spaced from the boundary of a, and may also be referred to as "not adjacent to the boundary of a" or "not located at the boundary of a", which is not limited in this embodiment of the present application, where a may be an image, a rectangular area, or a sub-image, etc.
It should also be understood that the foregoing descriptions of the embodiments of the present application focus on highlighting differences between the various embodiments, and that the same or similar elements that are not mentioned may be referred to one another and, for brevity, are not repeated herein.
The embodiment of the application also provides a computer readable storage medium for storing the computer program.
Optionally, the computer-readable storage medium may be applied to the encoding device or the decoding device in the embodiment of the present application, and the computer program enables a computer to execute corresponding processes implemented by the encoding device or the decoding device in the methods in the embodiments of the present application, which are not described herein again for brevity.
Embodiments of the present application also provide a computer program product comprising computer program instructions.
Optionally, the computer program product may be applied to the encoding device or the decoding device in the embodiment of the present application, and the computer program instructions enable the computer to execute corresponding processes implemented by the encoding device or the decoding device in the methods in the embodiments of the present application, which are not described herein again for brevity.
The embodiment of the application also provides a computer program.
Optionally, the computer program may be applied to the encoding device or the decoding device in the embodiment of the present application, and when the computer program runs on a computer, the computer executes a corresponding process implemented by the encoding device or the decoding device in each method in the embodiment of the present application, which is not described herein again for brevity.
It should be understood that, in the embodiment of the present application, the term "and/or" is only one kind of association relation describing an associated object, and means that three kinds of relations may exist. For example, a and/or B, may represent: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may also be an electric, mechanical or other form of connection.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the elements may be selected according to actual needs to achieve the purpose of the solution of the embodiments of the present application.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially or partially contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think of various equivalent modifications or substitutions within the technical scope of the present application, and these modifications or substitutions should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (108)

1. A method of encoding, comprising:
coding an image to be coded by using a coding technology based on a neural network to obtain a code stream of the image to be coded;
transmitting the code stream of the image to be coded through a first communication link;
model parameters of a neural network model included in the neural network-based encoding technique are transmitted over a second communication link.
2. The method of claim 1, wherein the first communication link and the second communication link have different physical characteristics.
3. Method according to claim 1 or 2, characterized in that the transmission delay of the first communication link is lower than the second communication link and/or the transmission bandwidth of the second communication link is higher than the first communication link.
4. The method according to any of claims 1 to 3, wherein the first communication link comprises a link having a latency less than or equal to a first threshold, and the second communication link comprises a link having a bandwidth greater than or equal to a second threshold.
5. The method according to any of claims 1 to 3, wherein the first communication link comprises a link based on a private graph transfer protocol or a wireless local area network protocol, and the second communication link comprises a link based on a mobile communication protocol.
6. The method of claim 5, wherein the private graph transfer protocol comprises a Software Defined Radio (SDR) protocol, wherein the wireless local area network protocol comprises a wireless fidelity (WiFi) protocol, and wherein the mobile communication protocol comprises a 4G or 5G protocol.
7. The method of claim 1 or 2, wherein the first communication link comprises a private graph transmission link and the second communication link comprises a public network transmission link.
8. The method of any one of claims 1 to 7, wherein the neural network model comprises an offline trained neural network model or an online trained neural network model.
9. The method according to claim 1 or 8, wherein the neural network model is an online trained neural network model, and the encoding the image to be encoded by using a neural network-based encoding technique comprises:
for an nth target image, coding the nth target image by utilizing a neural network model obtained by training the nth-m target images, wherein the target image is an image obtained by dividing the image to be coded according to any one of a video sequence level, an image group level and an image level; n is an integer greater than or equal to 1, and m is an integer greater than or equal to 1.
10. The method of claim 9, wherein the encoding the nth target image using the neural network model obtained by training the nth-m target images comprises:
and filtering the nth target image by using a neural network model obtained by training the nth-m target images, wherein the nth-m target images are images of the nth-m coded images which are not filtered by the neural network model, and the nth target image is an image of the nth coded image which is not filtered by the neural network model.
11. The method according to claim 9 or 10, wherein m is a parameter cured to or preset at an encoding end before encoding; or m is a parameter formed in the encoding process.
12. The method according to claim 1 or 8, wherein the neural network model is an online trained neural network model, and the encoding the image to be encoded by using a neural network-based encoding technique comprises:
for a first target image to be coded, coding the first target image by using a neural network model obtained by training a coded second target image, wherein the first target image and the second target image are images obtained by dividing the image to be coded according to any one of a video sequence level, an image group level and an image level;
the transmitting model parameters of the neural network model included in the neural network-based encoding technique over the second communication link includes:
and transmitting model parameters of the neural network model obtained by training the second target image through the second communication link.
13. The method of claim 12, wherein the images of the first target image and the second target image are spaced by q target images, wherein q is a positive integer greater than or equal to 0.
14. The method of claim 13, wherein q is a parameter fixed to or preset at an encoding end before encoding.
15. The method according to claim 1 or 8, wherein the neural network model is an offline-trained neural network model, and the encoding the image to be encoded by using a neural network-based encoding technique comprises:
and for the p target image, encoding the p target image by using the offline-trained neural network model, wherein the target image is an image obtained by dividing the image to be encoded according to any one of a video sequence level, an image group level and an image level, and p is an integer greater than or equal to 0.
16. The method according to any one of claims 1 to 11 or 15, wherein the code stream of the image to be encoded further includes first indication information, and the first indication information is used for indicating an identifier of a model parameter of a neural network model adopted when the image to be encoded is encoded.
17. The method according to any one of claims 1 to 11 or 15, wherein if the image to be encoded is a key frame, the code stream of the image to be encoded further includes first indication information, and the first indication information is used for indicating an identifier of a model parameter of a neural network model adopted when the image to be encoded is encoded.
18. The method of claim 17, wherein the first indication information is further used for indicating identification of model parameters of a neural network model employed in encoding other frames from a current key frame to a next key frame.
19. The method according to any one of claims 1 to 18, wherein said transmitting model parameters of the neural network model included in the neural network-based encoding technique via the second communication link comprises:
and transmitting the model parameters of the neural network model and the identification corresponding to the model parameters of the neural network model through the second communication link.
20. The method according to any one of claims 1 to 19, wherein the code stream of the image to be encoded further includes second indication information, and the second indication information is used to indicate that the neural network model is trained based on one of a group of images, a frame, or a sequence.
21. The method according to any one of claims 1 to 20, further comprising:
converting the model parameters of the neural network model into a target format;
compressing the model parameters of the target format to obtain compressed model parameters;
the transmitting model parameters of the neural network model included in the neural network-based encoding technique over the second communication link includes:
transmitting the compressed model parameters over the second communication link.
22. The method of claim 21, wherein the target format comprises a neural network transform format (NNEF) or an open neural network transform (ONNX) format.
23. The method according to claim 21 or 22, wherein compressing the model parameters of the target format comprises:
compressing the model parameters of the target format by using a neural network expression NNR compression method of a dynamic image expert group (MPEG) to obtain a code stream of NNR;
the transmitting the compressed model parameters over the second communication link includes:
and transmitting the code stream of the NNR through the second communication link.
24. The method according to claim 21 or 22, wherein compressing the model parameters of the target format comprises:
compressing the model parameters of the target format by using a compression method of the AITISA (Artificial Intelligence alliance of technical innovation strategy) to obtain compressed data;
the transmitting the compressed model parameters over the second communication link includes:
transmitting the compressed data over the second communication link.
25. The method according to any one of claims 1 to 24, wherein said encoding the image to be encoded using a neural network based encoding technique comprises:
utilizing the neural network based encoding technique to at least one of predict, transform, quantize, entropy encode, or filter the image to be encoded.
26. The method according to any one of claims 1 to 25, wherein a code stream of the image to be encoded further includes third indication information, and the third indication information is used to indicate whether encoding by using a neural network encoding technique is started.
27. The method according to any one of claims 1 to 26, wherein said transmitting model parameters of the neural network model included in the neural network-based encoding technique via the second communication link comprises:
transmitting part of or all of the model parameters of the neural network model through the second communication link.
28. The method according to any of claims 1 to 27, wherein the first communication link and/or the second communication link is selected from one or more of:
a link based on a wireless local area network protocol, a link based on a mobile communication protocol, a link based on an ethernet protocol.
29. A method of decoding, comprising:
receiving a code stream of an image to be decoded through a first communication link;
receiving model parameters of the neural network model over a second communication link;
and decoding the code stream by using the model parameters of the neural network model to obtain a decoded image.
30. The method of claim 29, wherein the first communication link and the second communication link have different physical characteristics.
31. The method according to claim 29 or 30, wherein the first communication link has a lower transmission delay than the second communication link and/or wherein the second communication link has a higher transmission bandwidth than the first communication link.
32. The method of any of claims 29 to 31, wherein the first communication link comprises a link having a latency less than or equal to a first threshold, and wherein the second communication link comprises a link having a bandwidth greater than or equal to a second threshold.
33. The method according to any of claims 29 to 31, wherein the first communication link comprises a link based on a private graph transfer protocol or a wireless local area network protocol, and wherein the second communication link comprises a link based on a mobile communication protocol.
34. The method of claim 33, wherein the private mapping protocol comprises a software radio (SDR) protocol, wherein the wireless local area network protocol comprises a wireless fidelity (WiFi) protocol, and wherein the mobile communication protocol comprises a 4G or 5G protocol.
35. The method of claim 29 or 30, wherein the first communication link comprises a private graph transmission link and the second communication link comprises a public network transmission link.
36. The method of any one of claims 29 to 35, wherein the neural network model comprises an offline trained neural network model or an online trained neural network model.
37. The method according to any one of claims 29 to 36, wherein the code stream further includes first indication information, where the first indication information is used to indicate an identifier of a model parameter of a neural network model adopted by an encoding end when encoding an image to be encoded;
the decoding the code stream by using the model parameters of the neural network model comprises the following steps:
and decoding the code stream by using the model parameter of the neural network model corresponding to the identifier of the model parameter.
38. The method according to any one of claims 29 to 36, wherein if the image to be decoded in the code stream is a key frame, the code stream further includes first indication information, where the first indication information is used to indicate an identifier of a model parameter of a neural network model adopted by an encoding end when the image to be encoded is encoded;
the decoding the code stream by using the model parameters of the neural network model comprises the following steps:
and decoding the code stream by using the model parameters of the neural network model and the identification of the model parameters.
39. The method of claim 38, wherein the first indication information is further used for indicating identification of model parameters of a neural network model employed in decoding other frames from a current key frame to a next key frame.
40. The method according to claim 29 or 36, wherein the neural network model is an online trained neural network model, and the decoding the code stream by using the model parameters of the neural network model to obtain the decoded image comprises:
decoding a code stream of a first target image by using a received model parameter of a neural network model obtained by training a second target image to obtain the first target image, wherein the first target image and the second target image are images to be decoded, and the images are divided according to any one of a video sequence level, an image group level and an image level.
41. The method of claim 40, wherein the images of the first target image and the second target image are separated by q target images, wherein q is a positive integer greater than or equal to 0.
42. The method of claim 41, wherein q is a parameter that is fixed to or preset at a decoding end before decoding.
43. The method of any one of claims 29 to 42, wherein receiving model parameters of a neural network model over a second communication link comprises:
and receiving the model parameters of the neural network model and the identification corresponding to the model parameters of the neural network model through the second communication link.
44. The method according to any one of claims 29 to 43, wherein the codestream further includes second indication information indicating that the neural network model is trained based on one of a group of pictures, a frame, or a sequence.
45. The method of any one of claims 29 to 44, wherein receiving model parameters of a neural network model over a second communication link comprises:
receiving the compressed model parameters over the second communication link;
the decoding the code stream by using the model parameters of the neural network model comprises the following steps:
decompressing the compressed model parameters to obtain a target format;
converting the target format;
and decoding the code stream by using the model parameters of the converted format.
46. The method of claim 45, wherein the target format comprises a neural network transform format (NNEF) or an open neural network transform (ONNX) format.
47. The method according to claim 45 or 46, wherein the receiving compressed model parameters over the second communication link comprises:
receiving a code stream of a neural network expression NNR through the second communication link;
the decompressing the compressed model parameters includes:
and decompressing the code stream of the NNR.
48. The method according to claim 45 or 46, wherein the receiving compressed model parameters over the second communication link comprises:
receiving compressed data over the second communication link;
the decompressing the compressed model parameters includes:
decompressing the compressed data.
49. The method according to any one of claims 29 to 48, wherein the decoding the codestream using the model parameters of the neural network model comprises:
and performing at least one of entropy decoding, inverse quantization, inverse transformation, prediction reconstruction or filtering on the code stream by using the model parameters of the neural network.
50. The method according to any one of claims 29 to 49, wherein the codestream further includes third indication information, and the third indication information is used to indicate whether encoding by using an encoding technique of a neural network is started.
51. The method of any one of claims 29 to 50, wherein receiving model parameters of a neural network model over a second communication link comprises:
receiving a portion of model parameters or all of the model parameters of the neural network model over the second communication link.
52. The method according to any of claims 29 to 51, wherein the first communication link and/or the second communication link is selected from one or more of:
a link based on a wireless local area network protocol, a link based on a mobile communication protocol, a link based on an ethernet protocol.
53. An encoding apparatus, comprising:
the processor is used for encoding the image to be encoded by utilizing an encoding technology based on a neural network so as to obtain a code stream of the image to be encoded;
transmitting the code stream of the image to be coded through a first communication link;
model parameters of a neural network model included in the neural network-based encoding technique are transmitted over a second communication link.
54. The encoding device of claim 53, wherein the first communication link and the second communication link have different physical characteristics.
55. The encoding device according to claim 53 or 54, wherein the transmission latency of the first communication link is lower than that of the second communication link, and/or wherein the transmission bandwidth of the second communication link is higher than that of the first communication link.
56. The encoding apparatus according to any one of claims 53 to 55, wherein the first communication link comprises a link having a latency less than or equal to a first threshold, and the second communication link comprises a link having a bandwidth greater than or equal to a second threshold.
57. The encoding apparatus according to any one of claims 53 to 55, wherein the first communication link comprises a link based on a private graph transfer protocol or a wireless local area network protocol, and the second communication link comprises a link based on a mobile communication protocol.
58. The encoding apparatus of claim 57, wherein the private mapping protocol comprises a software radio (SDR) protocol, wherein the wireless local area network protocol comprises a Wireless Fidelity (WiFi) protocol, and wherein the mobile communication protocol comprises a 4G or 5G protocol.
59. The encoding apparatus of claim 53, wherein the first communication link comprises a private graph transmission link and the second communication link comprises a public network transmission link.
60. The encoding apparatus of any one of claims 53 through 59, wherein the neural network model comprises an offline trained neural network model or an online trained neural network model.
61. The encoding apparatus of claim 53 or 60, wherein the neural network model is an online trained neural network model, and the processor is further configured to:
for an nth target image, coding the nth target image by utilizing a neural network model obtained by training the nth-m target images, wherein the target image is an image obtained by dividing the image to be coded according to any one of a video sequence level, an image group level and an image level; n is an integer greater than or equal to 1, and m is an integer greater than or equal to 1.
62. The encoding apparatus of claim 61, wherein the processor is further configured to:
and filtering the nth target image by using a neural network model obtained by training the nth-m target images, wherein the nth-m target images are images of the nth-m coded images which are not filtered by the neural network model, and the nth target image is an image of the nth coded image which is not filtered by the neural network model.
63. The encoding device as claimed in claim 61 or 62, wherein m is a parameter solidified to or preset at an encoding end before encoding; or m is a parameter formed in the encoding process.
64. The encoding apparatus of claim 53 or 60, wherein the neural network model is an online trained neural network model, and the processor is further configured to:
for a first target image to be coded, coding the first target image by using a neural network model obtained by training a coded second target image, wherein the first target image and the second target image are images obtained by dividing the image to be coded according to any one of a video sequence level, an image group level and an image level;
and transmitting model parameters of the neural network model obtained by training the second target image through the second communication link.
65. The encoding device according to claim 64, wherein the images of the first target image and the second target image are spaced by q target images, and q is a positive integer greater than or equal to 0.
66. The encoding device as claimed in claim 65, wherein q is a parameter fixed to or preset at an encoding end before encoding.
67. The encoding apparatus of claim 53 or 60, wherein the neural network model is an offline-trained neural network model, and wherein the processor is further configured to:
and for the p target image, encoding the p target image by using the offline-trained neural network model, wherein the target image is an image obtained by dividing the image to be encoded according to any one of a video sequence level, an image group level and an image level, and p is an integer greater than or equal to 0.
68. The encoding device according to any one of claims 53 to 63 or 67, wherein the code stream of the image to be encoded further includes first indication information, and the first indication information is used for indicating an identifier of a model parameter of a neural network model used when the image to be encoded is encoded.
69. The encoding device according to any one of claims 53 to 63 or 67, wherein if the image to be encoded is a key frame, the code stream of the image to be encoded further includes first indication information, and the first indication information is used to indicate an identifier of a model parameter of a neural network model used when the image to be encoded is encoded.
70. The encoding apparatus of claim 69, wherein the first indication information is further used for indicating identification of model parameters of a neural network model used in encoding other frames from a current key frame to a next key frame.
71. The encoding apparatus of any one of claims 53 through 70, wherein the processor is further configured to:
and transmitting the model parameters of the neural network model and the identification corresponding to the model parameters of the neural network model through the second communication link.
72. The encoding device according to any one of claims 53 to 71, wherein the code stream of the image to be encoded further includes second indication information, and the second indication information is used to indicate that the neural network model is trained based on one of a group of images, a frame, or a sequence.
73. The encoding apparatus of any one of claims 53 through 72, wherein the processor is further configured to:
converting the model parameters of the neural network model into a target format;
compressing the model parameters of the target format to obtain compressed model parameters;
transmitting the compressed model parameters over the second communication link.
74. The encoding apparatus according to claim 73, wherein the target format comprises a neural network transform format (NNEF) or an open neural network transform (ONNX) format.
75. The encoding apparatus of claim 73 or 74, wherein the processor is further configured to:
compressing the model parameters of the target format by using a neural network expression NNR compression method of a dynamic image expert group (MPEG) to obtain a code stream of NNR;
and transmitting the code stream of the NNR through the second communication link.
76. The encoding apparatus of claim 73 or 74, wherein the processor is further configured to:
compressing the model parameters of the target format by using a compression method of the AITISA (Artificial Intelligence alliance of technical innovation strategy) to obtain compressed data;
transmitting the compressed data over the second communication link.
77. The encoding device of any one of claims 53-76, wherein the processor is further configured to:
utilizing the neural network-based encoding technique to at least one of predict, transform, quantize, entropy encode, or filter the image to be encoded.
78. The encoding apparatus according to any one of claims 53 to 77, wherein a third indication information is further included in the code stream of the image to be encoded, the third indication information being used to indicate whether encoding using a neural network encoding technique is started.
79. The encoding apparatus of any one of claims 53 through 78, wherein the processor is further configured to:
transmitting part of or all of the model parameters of the neural network model through the second communication link.
80. The encoding apparatus according to any one of claims 53 to 79, wherein the first communication link and/or the second communication link is selected from one or more of:
a link based on a wireless local area network protocol, a link based on a mobile communication protocol, a link based on an ethernet protocol.
81. A decoding apparatus, comprising:
the processor is used for receiving a code stream of an image to be decoded through a first communication link;
receiving model parameters of the neural network model over a second communication link;
and decoding the code stream by using the model parameters of the neural network model to obtain a decoded image.
82. The decoding apparatus of claim 81, wherein the first communication link and the second communication link have different physical characteristics.
83. The decoding device according to claim 81 or 82, wherein the transmission latency of the first communication link is lower than that of the second communication link, and/or wherein the transmission bandwidth of the second communication link is higher than that of the first communication link.
84. The decoding device according to any of claims 81-83, wherein the first communication link comprises a link having a latency less than or equal to a first threshold, and the second communication link comprises a link having a bandwidth greater than or equal to a second threshold.
85. The decoding device according to any of claims 81-83, wherein the first communication link comprises a link based on a private graph transfer protocol or a wireless local area network protocol, and the second communication link comprises a link based on a mobile communication protocol.
86. The decoding device according to claim 85, wherein said private mapping protocol comprises a Software Defined Radio (SDR) protocol, wherein said wireless local area network protocol comprises a Wireless Fidelity (WiFi) protocol, and wherein said mobile communication protocol comprises a 4G or 5G protocol.
87. The decoding apparatus of claim 81, wherein the first communication link comprises a private graph transmission link and the second communication link comprises a public network transmission link.
88. The decoding apparatus of any one of claims 81-87, wherein the neural network model comprises an offline trained neural network model or an online trained neural network model.
89. The decoding device according to any one of claims 81 to 88, wherein the code stream further comprises first indication information, and the first indication information is used for indicating an identifier of a model parameter of a neural network model adopted by an encoding terminal when encoding an image to be encoded;
the processor is further configured to:
and decoding the code stream by using the model parameter of the neural network model corresponding to the identifier of the model parameter.
90. The decoding device according to any one of claims 81 to 88, wherein if the image to be decoded in the code stream is a key frame, the code stream further comprises first indication information, and the first indication information is used for indicating an identifier of a model parameter of a neural network model adopted by an encoding terminal when the image to be encoded is encoded;
the processor is further configured to:
and decoding the code stream by using the model parameters of the neural network model and the identification of the model parameters.
91. The decoding device according to claim 90, wherein the first indication information is further used for indicating identification of model parameters of a neural network model employed in decoding other frames from a current key frame to a next key frame.
92. The decoding apparatus of claim 81 or 88, wherein the neural network model is an online trained neural network model, and wherein the processor is further configured to:
decoding a code stream of a first target image by using a received model parameter of a neural network model obtained by training a second target image to obtain the first target image, wherein the first target image and the second target image are images to be decoded, and the images are divided according to any one of a video sequence level, an image group level and an image level.
93. The decoding device according to claim 92, wherein the pictures of the first target picture and the second target picture are spaced by q target pictures, and q is a positive integer greater than or equal to 0.
94. The decoding device according to claim 93, wherein q is a parameter fixed to or preset at a decoding end before decoding.
95. The decoding device of any one of claims 81-94, wherein the processor is further configured to:
and receiving the model parameters of the neural network model and the identification corresponding to the model parameters of the neural network model through the second communication link.
96. The decoding apparatus as claimed in any of claims 81 to 95, wherein the code stream further includes second indication information indicating that the neural network model is trained based on one of group of pictures, frame, or sequence.
97. The decoding device according to any one of claims 81-96, wherein the processor is further configured to:
receiving the compressed model parameters over the second communication link;
decompressing the compressed model parameters to obtain a target format;
converting the target format;
and decoding the code stream by using the model parameters of the converted format.
98. The decoding device according to claim 97, wherein the target format comprises a neural network transform format (NNEF) or an open neural network transform (ONNX) format.
99. The decoding device according to claim 97 or 98, wherein the processor is further configured to:
receiving a code stream of a neural network expression NNR through the second communication link;
and decompressing the code stream of the NNR.
100. The decoding device according to claim 97 or 98, wherein the processor is further configured to:
receiving compressed data over the second communication link;
decompressing the compressed data.
101. The decoding device according to any one of claims 81-100, wherein the processor is further configured to:
and performing at least one of entropy decoding, inverse quantization, inverse transformation, prediction reconstruction or filtering on the code stream by using the model parameters of the neural network.
102. The decoding device according to any one of claims 81 to 101, wherein the code stream further includes third indication information, and the third indication information is used to indicate whether encoding by using an encoding technique of a neural network is turned on.
103. The decoding device according to any one of claims 81-102, wherein the processor is further configured to:
receiving a portion of model parameters or all of the model parameters of the neural network model over the second communication link.
104. The decoding device according to any of the claims 81-103, wherein the first communication link and/or the second communication link is selected from one or more of the following links:
a link based on a wireless local area network protocol, a link based on a mobile communication protocol, a link based on an ethernet protocol.
105. An encoding apparatus, comprising: a processor and a memory for storing a computer program, the processor being configured to invoke and execute the computer program stored in the memory to perform the method of any of claims 1 to 28.
106. A decoding apparatus, comprising: a processor and a memory for storing a computer program, the processor being configured to invoke and execute the computer program stored in the memory to perform the method of any of claims 29 to 52.
107. A computer-readable storage medium comprising instructions for performing the encoding method of any one of claims 1 to 28.
108. A computer-readable storage medium comprising instructions for performing the decoding method of any one of claims 29 to 52.
CN202080078054.3A 2020-12-04 2020-12-04 Encoding method, decoding method, encoding device, and decoding device Active CN114731406B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/134085 WO2022116207A1 (en) 2020-12-04 2020-12-04 Coding method, decoding method, coding apparatus, and decoding apparatus

Publications (2)

Publication Number Publication Date
CN114731406A true CN114731406A (en) 2022-07-08
CN114731406B CN114731406B (en) 2024-10-11

Family

ID=81853685

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080078054.3A Active CN114731406B (en) 2020-12-04 2020-12-04 Encoding method, decoding method, encoding device, and decoding device

Country Status (2)

Country Link
CN (1) CN114731406B (en)
WO (1) WO2022116207A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024012474A1 (en) * 2022-07-14 2024-01-18 杭州海康威视数字技术股份有限公司 Image decoding method and apparatus based on neural network, image encoding method and apparatus based on neural network, and device thereof
WO2024213145A1 (en) * 2023-04-14 2024-10-17 杭州海康威视数字技术股份有限公司 Decoding method and apparatus, encoding method and apparatus, device, and medium
TWI859971B (en) 2022-07-14 2024-10-21 大陸商杭州海康威視數字技術股份有限公司 Method and apparatus for image decoding and encoding based on neural network, and device

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115720257B (en) * 2022-10-13 2023-06-23 华能信息技术有限公司 Communication security management method and system for video conference system
CN118354084A (en) * 2023-01-13 2024-07-16 杭州海康威视数字技术股份有限公司 Decoding and encoding method, device and equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229644A (en) * 2016-12-15 2018-06-29 上海寒武纪信息科技有限公司 The device of compression/de-compression neural network model, device and method
CN108271026A (en) * 2016-12-30 2018-07-10 上海寒武纪信息科技有限公司 The device and system of compression/de-compression, chip, electronic device
CN110870310A (en) * 2018-09-04 2020-03-06 深圳市大疆创新科技有限公司 Image encoding method and apparatus
WO2020188004A1 (en) * 2019-03-18 2020-09-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Methods and apparatuses for compressing parameters of neural networks

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013121793A1 (en) * 2012-02-16 2013-08-22 日本放送協会 Multi-channel sound system, transmitting device, receiving device, program for transmitting, and program for receiving
CN103580773A (en) * 2012-07-18 2014-02-12 中兴通讯股份有限公司 Method and device for transmitting data frame
CN109874018A (en) * 2018-12-29 2019-06-11 深兰科技(上海)有限公司 Image encoding method, system, terminal and storage medium neural network based

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229644A (en) * 2016-12-15 2018-06-29 上海寒武纪信息科技有限公司 The device of compression/de-compression neural network model, device and method
CN108271026A (en) * 2016-12-30 2018-07-10 上海寒武纪信息科技有限公司 The device and system of compression/de-compression, chip, electronic device
CN110870310A (en) * 2018-09-04 2020-03-06 深圳市大疆创新科技有限公司 Image encoding method and apparatus
WO2020188004A1 (en) * 2019-03-18 2020-09-24 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Methods and apparatuses for compressing parameters of neural networks

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024012474A1 (en) * 2022-07-14 2024-01-18 杭州海康威视数字技术股份有限公司 Image decoding method and apparatus based on neural network, image encoding method and apparatus based on neural network, and device thereof
TWI859971B (en) 2022-07-14 2024-10-21 大陸商杭州海康威視數字技術股份有限公司 Method and apparatus for image decoding and encoding based on neural network, and device
WO2024213145A1 (en) * 2023-04-14 2024-10-17 杭州海康威视数字技术股份有限公司 Decoding method and apparatus, encoding method and apparatus, device, and medium

Also Published As

Publication number Publication date
CN114731406B (en) 2024-10-11
WO2022116207A1 (en) 2022-06-09

Similar Documents

Publication Publication Date Title
CN114731406B (en) Encoding method, decoding method, encoding device, and decoding device
TWI830107B (en) Encoding by indicating feature map data
Lombardo et al. Deep generative video compression
US20240064318A1 (en) Apparatus and method for coding pictures using a convolutional neural network
WO2022068716A1 (en) Entropy encoding/decoding method and device
WO2022155974A1 (en) Video coding and decoding and model training method and apparatus
US11638025B2 (en) Multi-scale optical flow for learned video compression
CN116235496A (en) Encoding method, decoding method, encoder, decoder, and encoding system
WO2022063265A1 (en) Inter-frame prediction method and apparatus
WO2023279961A1 (en) Video image encoding method and apparatus, and video image decoding method and apparatus
KR20200109904A (en) System and method for DNN based image or video coding
CN117501696A (en) Parallel context modeling using information shared between partitions
CN113767626B (en) Video enhancement method and device
JP2024513693A (en) Configurable position of auxiliary information input to picture data processing neural network
WO2023098688A1 (en) Image encoding and decoding method and device
WO2023011420A1 (en) Encoding method and apparatus, and decoding method and apparatus
TW202337211A (en) Conditional image compression
WO2022063267A1 (en) Intra frame prediction method and device
WO2022257049A1 (en) Encoding method, decoding method, code stream, encoder, decoder and storage medium
CN116803078A (en) Encoding/decoding method, code stream, encoder, decoder, and storage medium
US20230252300A1 (en) Methods and apparatus for hybrid training of neural networks for video coding
WO2022147745A1 (en) Encoding method, decoding method, encoding apparatus, decoding apparatus
JP2024511587A (en) Independent placement of auxiliary information in neural network-based picture processing
CN116137659A (en) Inter-coded block partitioning method and apparatus
CN117768655A (en) Layered compression method and device for audio and video or image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant