[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113556582A - Video data processing method, device, equipment and storage medium - Google Patents

Video data processing method, device, equipment and storage medium Download PDF

Info

Publication number
CN113556582A
CN113556582A CN202110874693.7A CN202110874693A CN113556582A CN 113556582 A CN113556582 A CN 113556582A CN 202110874693 A CN202110874693 A CN 202110874693A CN 113556582 A CN113556582 A CN 113556582A
Authority
CN
China
Prior art keywords
frame
sequence
determining
frame sequence
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110874693.7A
Other languages
Chinese (zh)
Inventor
戴卫斌
王一
李伟琪
龚力
于波
周宇虹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Eswin Computing Technology Co Ltd
Haining Eswin IC Design Co Ltd
Original Assignee
Beijing Eswin Computing Technology Co Ltd
Haining Eswin IC Design Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Eswin Computing Technology Co Ltd, Haining Eswin IC Design Co Ltd filed Critical Beijing Eswin Computing Technology Co Ltd
Priority to CN202110874693.7A priority Critical patent/CN113556582A/en
Publication of CN113556582A publication Critical patent/CN113556582A/en
Priority to PCT/CN2021/142584 priority patent/WO2023005140A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234381Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by altering the temporal resolution, e.g. decreasing the frame rate by frame skipping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440281Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by altering the temporal resolution, e.g. by frame skipping

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The embodiment of the application discloses a video data processing method, a video data processing device, video data processing equipment and a storage medium. The method comprises the following steps: determining an initial frame sequence of a video to be processed, and determining the brightness of each pixel point of each image frame in the initial frame sequence; based on the brightness of each pixel point of each image frame in the initial frame sequence, performing frame extraction processing on the initial frame sequence, and taking the frame sequence after frame extraction as a first frame sequence; coding the first frame sequence to obtain coded data of a video to be processed, wherein the coded data carries the reference frame serial number of each reference frame, and each reference frame is an image frame adjacent to a target frame sequence extracted in the frame extraction process; and sending the coded data to the first equipment, so that the first equipment determines a second frame sequence based on the coded data and the reference frames corresponding to the reference frame sequence numbers, and determines to play the video based on the second frame sequence. By adopting the embodiment of the application, the bandwidth in the video data transmission process is saved, and the applicability is high.

Description

Video data processing method, device, equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a method, an apparatus, a device, and a storage medium for processing video data.
Background
With the rapid development of internet technology, multimedia data such as short videos and online live broadcasts are increased rapidly, and the requirement of a continuously growing video user on the visual effect is higher and higher, so that the challenge is provided for how to reduce bandwidth consumption and ensure the visual effect of the video in a video data processing method.
At present, when a video to be processed is coded, a fixed mode is often adopted for frame extraction processing so as to reduce the frame rate of the video to be processed. However, the conventional frame extraction processing method often causes the problems of discontinuity of video pictures and the like. After the encoded data of the video to be processed is decoded, frame interpolation processing is often adopted to improve the frame rate of a frame sequence obtained by decoding, so as to improve the video quality. However, the conventional video frame interpolation methods, such as frame sampling and frame mixing, may cause problems of motion jitter and picture blurring.
Disclosure of Invention
The embodiment of the application provides a video data processing method, a video data processing device, a video data processing apparatus and a storage medium, which can save bandwidth in a video data transmission process, improve video definition and have high applicability.
In a first aspect, an embodiment of the present application provides a video data processing method, where the method includes:
determining an initial frame sequence of a video to be processed, and determining the brightness of each pixel point of each image frame in the initial frame sequence;
based on the brightness of each pixel point of each image frame in the initial frame sequence, performing frame extraction processing on the initial frame sequence, and taking the frame sequence after frame extraction as a first frame sequence;
coding the first frame sequence to obtain coded data of the video to be processed, wherein the coded data carries reference frame serial numbers of all reference frames, and each reference frame is an image frame adjacent to a target frame sequence extracted in the frame extraction process;
and transmitting the encoded data to a first device so that the first device determines a second frame sequence based on the encoded data and the reference frames corresponding to the reference frame numbers, and determines a playing video based on the second frame sequence.
In a second aspect, an embodiment of the present application provides a video data processing method, where the method includes:
acquiring encoded data sent by second equipment, and decoding the encoded data to obtain a first frame sequence, wherein the first frame sequence is a frame sequence obtained by performing frame extraction processing on an initial frame sequence of a video to be processed by the second equipment;
determining each group of associated reference frames in the reference frames corresponding to the first frame sequence based on each reference frame sequence number carried by the encoded data, wherein each group of associated reference frames comprises two reference frames adjacent to a target frame sequence extracted in the frame extraction process;
for each group of the associated reference frames, determining a first predicted frame corresponding to the first reference frame and a second predicted frame corresponding to the second reference frame, and an occlusion weight and a reconstruction residual in a frame prediction process based on a first reference frame and a second reference frame in the group of the associated reference frames, and determining a target predicted frame corresponding to the group of the associated reference frames based on the first predicted frame, the second predicted frame, the occlusion weight and the reconstruction residual;
and performing frame interpolation processing on the target prediction frames corresponding to each group of the associated reference frames to obtain a second frame sequence, and obtaining a played video based on the second frame sequence.
In a third aspect, an embodiment of the present application provides a video data processing apparatus, including:
the brightness determining module is used for determining an initial frame sequence of a video to be processed and determining the brightness of each pixel point of each image frame in the initial frame sequence;
a frame sequence determining module, configured to perform frame extraction processing on the initial frame sequence based on brightness of each pixel point of each image frame in the initial frame sequence, and use the frame sequence after frame extraction as a first frame sequence;
a coding module, configured to code the first frame sequence to obtain coded data of the video to be processed, where the coded data carries a reference frame number of each reference frame, and each reference frame is an image frame adjacent to a target frame sequence extracted in the frame extraction process;
and a sending module, configured to send the encoded data to a first device, so that the first device determines a second frame sequence based on the encoded data and reference frames corresponding to the reference frame numbers, and determines to play a video based on the second frame sequence.
In a fourth aspect, an embodiment of the present application provides a video data processing apparatus, including:
the decoding module is used for acquiring encoded data sent by second equipment and decoding the encoded data to obtain a first frame sequence, wherein the first frame sequence is a frame sequence obtained by performing frame extraction processing on an initial frame sequence of a video to be processed by the second equipment;
a reference frame determining module, configured to determine, based on the reference frame sequence numbers carried by the encoded data, groups of associated reference frames in reference frames corresponding to the first frame sequence, where each group of associated reference frames includes two reference frames adjacent to a target frame sequence extracted in the frame extraction process;
a frame prediction module, configured to determine, for each group of associated reference frames, a first predicted frame corresponding to the first reference frame and a second predicted frame corresponding to the second reference frame, and an occlusion weight and a reconstruction residual in a frame prediction process based on a first reference frame and a second reference frame in the group of associated reference frames, and determine a target predicted frame corresponding to the group of associated reference frames based on the first predicted frame, the second predicted frame, the occlusion weight, and the reconstruction residual;
and the video determining module is used for performing frame interpolation processing on the target prediction frames corresponding to the associated reference frames to obtain a second frame sequence, and obtaining a played video based on the second frame sequence.
In a fifth aspect, an embodiment of the present application provides an electronic device, including a processor and a memory, where the processor and the memory are connected to each other;
the memory is used for storing computer programs;
the processor is configured to execute any one of the video data processing methods of the first aspect and/or the second aspect when the computer program is called.
In a sixth aspect, the present application provides a computer-readable storage medium, where a computer program is stored, where the computer program is executed by a processor to implement any one of the video data processing methods of the first aspect and/or the second aspect.
In the embodiment of the application, by performing frame extraction processing on the initial frame sequence of the video to be processed, the initial frame sequence with the high frame rate can be converted into the first frame sequence with the low frame rate, so that the size of video data can be greatly reduced by encoding the first frame sequence, and further, data traffic consumed by data transmission is correspondingly reduced and encoded, thereby achieving the effect of saving bandwidth cost. And for each group of associated reference frames in the first frame sequence obtained by decoding, by determining the first predicted frame and the second predicted frame corresponding to the first reference frame and the second reference frame in each group of associated reference frames, and the occlusion weight and the reconstruction residual error in the frame prediction process, the occlusion information in the frame prediction process and the detail information of each image frame can be fully considered, and the problems of object jitter, edge blur and the like in the frame prediction process can be effectively solved, so that the video definition is improved, and the video viewing experience is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a network architecture diagram of a video data processing method provided in an embodiment of the present application;
fig. 2 is a flow chart of a video data processing method according to an embodiment of the present application;
fig. 3 is another schematic flow chart of a video data processing method provided in an embodiment of the present application;
FIG. 4 is a schematic diagram of a scenario for determining contextual characteristics according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an optical flow field estimation model provided in an embodiment of the present application;
FIG. 6 is a schematic diagram of a scenario for determining residual features according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a scenario for determining a fusion feature according to an embodiment of the present application;
FIG. 8 is a schematic diagram of another scenario for determining contextual characteristics according to an embodiment of the present application;
FIG. 9 is a schematic diagram of another scenario for determining contextual characteristics according to an embodiment of the present application;
FIG. 10 is a schematic view of a scene for determining reconstructed residual and occlusion weights according to an embodiment of the present application;
FIG. 11 is a schematic diagram of another scenario for determining occlusion weights and reconstructing residuals according to an embodiment of the present application;
FIG. 12 is a diagram illustrating a scenario for determining a target predicted frame according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of a video data processing apparatus according to an embodiment of the present application;
fig. 14 is a schematic structural diagram of another video data processing apparatus according to an embodiment of the present application;
fig. 15 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a network architecture diagram of a video data processing method according to an embodiment of the present application. As shown in fig. 1, after acquiring the video to be processed, the second device 100 may determine an initial frame sequence of the video to be processed, and determine the brightness of each pixel point of each image frame in the initial frame sequence. Further, the second device 100 may perform frame extraction processing on the initial frame sequence based on the brightness of each pixel point of each image frame in the initial frame sequence, and use the frame sequence after frame extraction as the first frame sequence, that is, the second device 100 may reduce the frame rate of the video to be processed by performing frame extraction processing on the initial frame sequence.
After obtaining the first frame sequence, the second device 100 may encode the first frame sequence to obtain encoded data of the video to be processed, and then send the encoded data to the first device through the network connection. The encoded data sent by the second device 100 carries the reference frame number of each reference frame, where each reference frame is an image frame adjacent to the target frame sequence extracted in the frame extraction process.
Correspondingly, after acquiring the encoded data sent by the second device 100, the first device 200 may decode the encoded data to obtain a first frame sequence, and determine, based on each reference frame sequence number carried by the encoded data, each group of associated reference frames in the reference frame frames corresponding to the first frame sequence. Wherein each group of associated reference frames comprises two reference frames adjacent to the target frame sequence extracted in the frame extraction process.
For each group of associated reference frames, the first device 200 may determine, based on a first reference frame and a second reference frame in the group of associated reference frames, a second predicted frame corresponding to a first predicted frame and a second reference frame corresponding to the first reference frame, and an occlusion weight and a reconstruction residual in a frame prediction process, and determine, based on the first predicted frame, the second predicted frame, the occlusion weight, and the reconstruction residual, a target predicted frame corresponding to the group of associated reference frames;
further, the second device 100 may perform frame interpolation on the target predicted frames corresponding to each group of associated reference frames to obtain a second frame sequence, and obtain the played video based on the second frame sequence.
The second device 100 may be a video capture device, such as a camera device, a video generation device, or a video processing device, and may be determined based on the actual application scene requirement, which is not limited herein.
The second device 100 may be a video forwarding device, a video playing device, or the like, and may be determined based on the actual application scene requirement, which is not limited herein.
Referring to fig. 2, fig. 2 is a schematic flow chart of a video data processing method according to an embodiment of the present application. When the video data processing method provided by the embodiment of the present application is applied to the second device, the method may specifically include the following steps:
step S21, determining an initial frame sequence of the video to be processed, and determining the brightness of each pixel point of each image frame in the initial frame sequence.
In some feasible embodiments, the video to be processed may be a video acquired by the second device in real time, may also be a video generated by the second device based on video processing software, may also be a video acquired by the second device from a network, a local storage space, a cloud storage space, or the like, and may specifically be determined based on requirements of an actual application scene, which is not limited herein.
In some possible embodiments, for the video to be processed, an initial frame sequence of the video to be processed may be determined, and the brightness of each pixel point of each image frame in the initial frame sequence may be determined. Specifically, for any image frame in the initial frame sequence, the pixel value of each color channel of each pixel point of the image frame may be determined, and for each pixel point of the image frame, the brightness of the pixel point may be determined based on the pixel value of each color channel of the pixel point.
Step S22, based on the brightness of each pixel point of each image frame in the initial frame sequence, performs frame extraction processing on the initial frame sequence, and takes the frame sequence after frame extraction as the first frame sequence.
In some possible embodiments, based on the brightness of each pixel point of each image frame in the initial frame sequence, the initial frame sequence is subjected to frame extraction, and the frame sequence after frame extraction is taken as the first frame sequence. By performing frame extraction processing on the initial frame sequence, the initial frame sequence with a high frame rate of the video to be processed can be converted into the first frame sequence with a low frame rate.
Specifically, for any image frame in the initial frame sequence, the luminance difference between each pixel point of the image frame and the corresponding pixel point of the previous image frame may be determined, and further, the total luminance difference between the image frame and the previous image frame may be determined based on the luminance difference corresponding to each pixel point of the image frame.
For any image frame in the initial frame sequence, the brightness difference between one pixel point of the image frame and the corresponding pixel point of the previous image frame is the absolute value of the brightness difference between the two image frames.
For example, for any image frame in the initial frame sequence, the total brightness difference Δ between the image frame and the previous image framelightComprises the following steps:
Δlight=∑|Ii,j(t)-Ii,j(t-1)|
wherein I and j are used to represent the position of the pixel point, Ii,j(t) luminance, I, of a pixel point (I, j) of the t-th frame image framei,j(t-1) represents the luminance, | I, of a pixel point (I, j) of the t-1 th frame image framei,j(t)-Ii,j(t-1) | represents the absolute value of the luminance difference between the t-th frame image frame and the t-1 th frame image frame at the pixel point (i, j).
Further, if the total brightness difference between any image frame and the previous image frame in the initial frame sequence of the video to be processed is greater than the first threshold, it indicates that the scene change of the image frame is greater than that of the previous image frame, and therefore, the image frame of which the total brightness difference is greater than the first threshold in the initial frame sequence of the video to be processed may be determined as the active frame. If the total brightness difference between any image frame and the previous image frame in the initial frame sequence of the video to be processed is smaller than or equal to the first threshold, it indicates that the scene change of the image frame is smaller than that of the previous image frame, and therefore, the image frame of which the total brightness difference is smaller than or equal to the first threshold in the initial frame sequence of the video to be processed may be determined as a still frame.
As an example, for any image frame of an initial sequence of frames, the image frame may be marked to distinguish the image frame as either an active frame or a still frame.
Figure BDA0003190176840000071
As shown in the above formula, K (t) represents the mark of the t-th frame, if the total brightness difference Δ of the t-th frame image frame is compared with the previous frame image framelightAbove the first threshold T1, the image frame of the T-th frame is marked as 1 to indicate that the image frame of the T-th frame is an active frame. If the total brightness difference delta of the t-th frame image frame is compared with the previous frame image framelightNot greater than the first threshold T1, the image frame of the T-th frame is marked as 0 to indicate that the image frame of the T-th frame is a still frame.
Further, after determining the active frames and the static frames in the initial frame sequence, the initial frame sequence may be decimated based on the active frames and the static frames in the initial frame sequence, so as to obtain the first frame sequence. Specifically, a continuous active frame sequence and a continuous static frame sequence in the initial frame sequence can be determined, and the continuous active frame sequence and the continuous static frame sequence in the initial frame sequence are subjected to frame extraction processing.
As an example, any other frame image frame and/or any number of consecutive frame images of the consecutive active frames except the first frame image frame and the last frame image frame may be used as the target frame sequence, and the target frame sequence may be extracted from the consecutive active frames. Similarly, any other frame image frame and/or any number of consecutive frame images in the consecutive still frames except the first frame image frame and the last frame image frame may be used as the target frame sequence, and the target frame sequence may be extracted from the consecutive still frames. After the target frame sequence is extracted from the continuous active frame sequence and the continuous static frame sequence based on the above manner, the initial frame sequence after the target frame sequence is extracted may be used as the first frame sequence.
In some possible embodiments, to perform frame extraction on videos in the same scene in a centralized manner, an initial frame rate corresponding to a video to be processed may be determined, the initial frame rate of the video to be processed is divided into at least one subframe sequence based on the initial frame rate of the video to be processed, a continuous active frame sequence and a continuous static frame sequence in each subframe sequence are further determined, and a target frame sequence is extracted from each continuous active frame sequence and continuous static frame sequence.
The target frame sequence extracted from the continuous active frame sequence and the static frame sequence in each sub-frame sequence is also any one frame image frame or any continuous several frames image frames except the first frame and the last frame in the corresponding continuous active frame sequence or static frame sequence.
For example, the duration of the video to be processed is 10s, and the initial frame rate is 24 Hz. The initial frame sequence may be divided into 10 subframe sequences of duration 1s, each subframe sequence comprising 24 frame images, and each subframe sequence may be decimated.
In some possible embodiments, for each continuous active frame in each subframe sequence, if the number of active frames in the continuous active frame sequence is greater than the second threshold, the continuous active frame sequence is subjected to a frame extraction process. Namely, any other frame image frame except the first frame image frame and the last frame image frame and/or any number of continuous image frames in the continuous moving frames are used as a target frame sequence, and then the target frame sequence is extracted from the continuous moving frames. And for each continuous static frame in each subframe sequence, if the number of the static frames in the continuous static frame sequence is greater than a third threshold value, performing frame extraction processing on the continuous static frame sequence. Namely, any other frame image frame except the first frame image frame and the last frame image frame and/or any number of continuous image frames in the continuous still frames are used as the target frame sequence, and then the target frame sequence is extracted from the continuous still frames. For any subframe sequence, if the number of active frames in consecutive active frames of the subframe sequence is less than or equal to the second threshold and the number of static frames in consecutive static frames is less than or equal to the third threshold, no extraction is performed on the subframe sequence. The specific frame extraction method can be as follows:
Figure BDA0003190176840000081
where P denotes an extracted target frame sequence, N (k) (T) 1 denotes the number of active frames in a continuous active frame sequence, N (k) (T) 0 denotes the number of still frames in a continuous still frame sequence, T2 and T3 denote a second threshold value and a third threshold value, respectively, and I denotes{2,3,4,…,last-1}(k (t) ═ 1) denotes any intermediate frame in a sequence of consecutive active frames, I{2,3,4,…,last-1}(k (t) ═ 0) denotes any intermediate frame in the continuous sequence of still frames.
For example, a certain subframe sequence is shown in table 1:
TABLE 1
Marking 1 0 1 0 0 0 1 1 0 1 0 0
Frame number 1 2 3-20 21 22 23 24 25 26-35 36-65 66 67
Whether to extract frames Whether or not Whether or not Is that Whether or not Whether or not Whether or not Whether or not Whether or not Is that Is that Whether or not Whether or not
In table 1, frame numbers 3-20 correspond to consecutive active frames, frame numbers 26-35 correspond to consecutive still frames, and frame numbers 36-65 correspond to consecutive active frames. If the second threshold and the third threshold are 4, any one frame image frame or continuous multi-frame image frames except the first frame and the last frame in the continuous active frames corresponding to the frame numbers 3-20, the continuous static frames corresponding to the frame numbers 26-35 and the continuous active frames corresponding to the frame numbers 36-65 can be extracted, so that the subframe sequence after frame extraction is determined as the first frame sequence.
And step S23, coding the first frame sequence to obtain coded data of the video to be processed, wherein the coded data carries the reference frame serial number of each reference frame.
In some possible embodiments, after the first frame sequence is obtained by performing the frame decimation processing on the initial frame sequence, the first frame sequence may be encoded to obtain encoded data of the video to be processed.
Specifically, the encoding modes adopted in encoding the first frame sequence include, but are not limited to, h.264, h.265, AVS2, AV1, and the like, and may be determined based on actual application scenario requirements, and are not limited herein.
The coded data of the video to be processed carries the reference frame serial number of each reference frame, and each reference frame is an image frame adjacent to the target frame sequence extracted in the frame extraction process. That is, after the target frame sequence is extracted from the initial frame sequence to obtain the first frame sequence, two image frames adjacent to the extracted target frame sequence in the first frame sequence may be determined as reference frames, and the frame number of each reference frame may be determined.
As shown in table 1, in the case where the target frame sequence extracted from the active frame sequence is the image frame of frame number 4 to frame number 5 in the active frame sequence of frame number 3 to frame number 20, the image frames adjacent to the target frame sequence are the image frame of frame number 3 and the image frame of frame number 6, and are determined as two reference frames.
When the first frame sequence is encoded, the reference frame sequence number of each reference frame can be encoded together with the first frame sequence, so that the encoded data of the video to be processed carries each reference frame sequence number. Or after the first frame sequence is encoded to obtain the encoded data of the video to be processed, the reference frame sequence numbers and the encoded data are further processed, so that the encoded data of the video to be processed carries the reference frame sequence numbers.
Step S24, sending the encoded data to the first device, so that the first device determines a second frame sequence based on the encoded data and the reference frames corresponding to the reference frame numbers, and determines to play the video based on the second frame sequence.
In some possible embodiments, after obtaining the encoded data carrying the reference frame sequence numbers of the reference frames, the encoded data may be sent to the first device 200, so that the first device 200 may determine the second frame sequence based on the encoded data and the reference frames corresponding to the reference frame sequence numbers, and determine to play the video based on the second frame sequence.
Specifically, the specific manner of sending the encoded data to the first device 200 by the second device 100 includes, but is not limited to, Content Delivery Network (CDN) transmission technology, Peer-to-Peer (P2P) Network transmission technology, and PCDN transmission technology combining CDN and P2P
The second device 100 transmits the encoded data to the first device 200, and also transmits the frame numbers of the reference frames to the first device 200.
In the embodiment of the application, by performing frame extraction processing on the initial frame sequence of the video to be processed, the initial frame sequence with the high frame rate can be converted into the first frame sequence with the low frame rate, so that the size of video data can be greatly reduced by encoding the first frame sequence, and further, data traffic consumed by data transmission is correspondingly reduced and encoded, thereby achieving the effect of saving bandwidth cost.
In some possible embodiments, after the second device 100 sends the encoded data carrying the reference frame numbers to the first device 200, a specific processing manner of the encoded data by the first device 200 may be as shown in fig. 3, where fig. 3 is another schematic flow diagram of the video data processing method provided by the embodiment of the present application. When the video data processing method provided in the embodiment of the present application is applied to the first device 200, the method may specifically include the following steps:
step S31, obtaining the encoded data sent by the second device 100, and decoding the encoded data to obtain a first frame sequence.
In some possible embodiments, after acquiring the encoded data sent by the second device 100, the first device 200 may decode the encoded data based on the encoding technique adopted by the second device 100 to obtain the first frame sequence. The first frame sequence is a frame sequence obtained by performing frame extraction on an initial frame sequence of the video to be processed by the second device 100, that is, a residual frame sequence after a target frame sequence is extracted from the initial frame sequence.
Step S32, determining each group of associated reference frames in the reference frames corresponding to the first frame sequence based on each reference frame sequence number carried by the encoded data.
In some possible embodiments, the sequence number of each image frame in the first frame sequence obtained by the first device 200 after decoding the encoded data is the frame sequence number in the initial frame sequence corresponding to the video to be processed. Based on this, the first device 200 may determine the reference frames in the first frame sequence based on the reference frame sequence numbers after acquiring the reference frame sequence numbers.
Further, the first device 200 may determine groups of associated reference frames from the reference frames in the first frame sequence, where each group of associated reference frames includes two reference frames adjacent to the target frame sequence extracted by the second device 100 during the frame extraction process performed on the initial frame sequence. The first device 200 may further determine a target prediction frame corresponding to each group of associated reference frames, and perform frame interpolation on the target prediction frame to obtain a second frame sequence. The specific implementation manner of determining the target prediction frame corresponding to each group of associated reference frames and performing interpolation processing on the target prediction frame to obtain the second frame sequence by the first device 200 is described below, and will not be described herein.
Step S33, for each group of associated reference frames, determining a second predicted frame corresponding to a first predicted frame and a second predicted frame corresponding to the first reference frame and a second reference frame in the group of associated reference frames, and an occlusion weight and a reconstruction residual in the frame prediction process, based on the first predicted frame, the second predicted frame, the occlusion weight, and the reconstruction residual, and determining a target predicted frame corresponding to the group of associated reference frames.
In some possible embodiments, for each set of associated reference frames, a first predicted frame corresponding to the first reference frame and a second predicted frame corresponding to the second reference frame may be determined based on the first reference frame and the second reference frame in the set of associated reference frames.
For each group of associated reference frames, the first reference frame in the group of associated reference frames is a reference frame with a smaller reference frame number, and the second reference frame is a reference frame with a larger reference frame number. The first predicted frame and the second predicted frame are both image frames between the first reference frame and the second reference frame.
Specifically, for each set of associated reference frames, a first optical flow field corresponding to the first reference frame and a second optical flow field corresponding to the second reference frame may be determined based on the first reference frame and the second reference frame in the set of associated reference frames.
The optical flow field is a two-dimensional instantaneous velocity field formed by all pixel points in an image, and the two-dimensional instantaneous velocity field comprises the change of pixels in a time domain and the correlation between adjacent frames to find the corresponding relation between the previous frame and the current frame.
For each group of associated reference frames, when the optical flow fields corresponding to the first reference frame and the second reference frame are determined, feature extraction can be performed on the first reference frame to obtain a first initial feature, feature extraction can be performed on the second reference frame to obtain a second initial feature, and then the associated features corresponding to the first reference frame and the second reference frame are obtained based on the first initial feature and the second initial feature.
When feature extraction is performed on the first reference frame and the second reference frame, feature extraction can be performed on each reference frame based on a neural network and the like to obtain corresponding initial features. After the first initial feature and the second initial feature are obtained, the associated features of the first initial feature and the second initial feature may be obtained based on feature splicing, feature fusion, or further processing based on other neural network models, and a specific implementation manner may be determined based on requirements of an actual application scenario, which is not limited herein.
For the first reference frame, a first context feature of the first reference frame may be determined, and based on the first context feature and the association feature, a first optical flow field corresponding to the first reference frame may be determined. For the second reference frame, a second context feature of the second reference frame may be determined, and based on the second context feature and the above-mentioned associated feature, a second optical flow field corresponding to the second reference frame may be determined.
Wherein the context feature of each reference frame may be implemented based on a context feature extraction network. Referring to fig. 4, fig. 4 is a schematic diagram of a scenario for determining a contextual characteristic according to an embodiment of the present application. The context feature extraction network shown in fig. 4 includes a plurality of convolutional layer and activation function combinations connected in series. For any one of the first reference frame and the second reference frame, the reference frame may be convolved based on the first convolution layer to obtain a first convolution feature, and the first convolution feature is processed through the first activation function to obtain a feature map of the reference frame. And further, continuously performing convolution processing on the feature map obtained by the last activation function based on the second convolution layer to obtain a second convolution feature, and processing the second convolution feature through the second activation function to obtain a second feature map of the reference frame. By analogy, the feature map obtained by each activation function in fig. 4 can be determined as the context feature of the reference frame.
The number of convolutional layers and activation functions in the feature extraction network shown in fig. 4 may be specifically determined based on the requirements of the actual application scenario, and is not limited herein.
When determining the Optical Flow fields corresponding to the first reference frame and the second reference frame of each group of associated reference frames, the determination can be performed based on an Optical Flow Field estimation model (RAFT).
As an example, referring to fig. 5, fig. 5 is a light provided by the embodiment of the present applicationAnd (3) a structural schematic diagram of the flow field estimation model. As shown in FIG. 5, for any set of associated reference frames, the first reference frame I in the set of associated reference framesaAnd a second reference frame IdInputting a feature coding module to respectively code the first reference frame I based on the featureaAnd a second reference frame IdAnd performing feature extraction to obtain a first initial feature and a second initial feature, and further performing feature association based on the first initial feature and the second initial feature to obtain an associated feature. Wherein, I represents the reference frame, and a and d are respectively the position information (such as frame number, time domain position, etc.) of the reference frame.
For the first reference frame IaThe first reference frame I may be determined based on a context feature extraction networkaFirst context feature C ofa. Wherein the first context feature CaCan be represented as Ca:
Figure BDA0003190176840000121
Figure BDA0003190176840000122
Figure BDA0003190176840000131
And
Figure BDA0003190176840000132
respectively, a feature map based on a convolutional layer and an activation function. A first reference frame IaFirst context feature C ofaAnd inputting the associated features into a recurrent neural network to obtain a first reference frame IaFirst optical flow field Fb→aAnd b is the position information of the target predicted frame corresponding to the group of associated reference frames, wherein a is larger than b and larger than d.
For the same reason, for the second reference frame IdThe second reference frame I may be determined by the context feature extraction networkdSecond context feature Cd. Wherein the second context feature CdCan be represented as Cd:
Figure BDA0003190176840000133
Figure BDA0003190176840000134
And
Figure BDA0003190176840000135
respectively, a feature map based on a convolutional layer and an activation function. Second reference frame IdSecond context feature CdAnd inputting the associated features into a recurrent neural network to obtain a second reference frame IdSecond optical flow field Fb→dAnd b is the position information of the target predicted frame corresponding to the group of associated reference frames, wherein a is larger than b and larger than d.
Further, the first reference frame is subjected to backward mapping based on the first optical flow field to obtain a first prediction frame corresponding to the first reference frame, and the second reference frame is subjected to backward mapping based on the second optical flow field to obtain a second prediction frame corresponding to the second reference frame. E.g. based on the first optical flow field Fb→aFor the first reference frame IaBackward mapping is carried out to obtain a first predicted frame
Figure BDA0003190176840000136
Based on the second optical flow field Fb→dFor the second reference frame IdPerforming backward mapping to obtain a second predicted frame
Figure BDA0003190176840000137
In some possible embodiments, for each set of associated reference frames, occlusion weights and reconstruction residuals in the frame prediction process are determined based on the first reference frame and the second reference frame in the set of associated reference frames. The reconstruction residual error is used for reducing the gradient descent problem in the frame prediction process, and the shielding weight is used for reducing the influence caused by the shaking of a moving object, the edge blurring and the like in the frame prediction process.
Specifically, a third contextual characteristic of the first reference frame and a fourth contextual characteristic of the second reference frame may be determined first. The third contextual feature of the first reference frame and the fourth contextual feature of the second reference frame may be determined based on the manner shown in fig. 4, or may be determined based on other contextual feature extraction networks, and may specifically be determined based on requirements of an actual application scenario, which is not limited herein.
Further, the occlusion weight and the reconstruction residual in the frame prediction process may be determined based on the first optical flow field, the second optical flow field, the first predicted frame, the second predicted frame, the third context feature, and the fourth context feature. If the first optical flow field, the second optical flow field, the first prediction frame, the second prediction frame, the third context feature and the fourth context feature are input into the deep neural network, the occlusion weight and the reconstruction residual error in the frame prediction process are obtained. The deep neural network includes, but is not limited to, fusion Net and U-Net, and may be determined based on the requirements of the actual application scenario, which is not limited herein.
In some possible embodiments, when determining the occlusion weight and the reconstructed residual in the frame prediction process based on the first optical flow field, the second optical flow field, the first predicted frame, the second predicted frame, the third context feature, and the fourth context feature, the residual feature may be determined based on the first optical flow field, the second optical flow field, the first predicted frame, and the second predicted frame.
As an example, referring to fig. 6, fig. 6 is a schematic view of a scenario for determining residual features according to an embodiment of the present application. As shown in fig. 6, the first optical flow field, the second optical flow field, the first prediction frame, and the second prediction frame are input to the convolution layer, processed by the convolutional neural network and the activation function, and then input to the residual block to obtain the residual features.
Further, a fused feature may be determined based on the third context feature, the fourth context feature, and the residual feature. Referring to fig. 7, fig. 7 is a schematic view of a scenario for determining a fusion feature according to an embodiment of the present application. As shown in fig. 7, the residual feature is compared with the first context feature in the third context and the fourth context
Figure BDA0003190176840000141
And
Figure BDA0003190176840000142
and splicing to obtain a first splicing characteristic, inputting the first splicing characteristic into the convolution layer, and performing downsampling convolution processing to obtain a first convolution characteristic. Associating the first convolution feature with the second context feature in the third context and the fourth context
Figure BDA0003190176840000143
And
Figure BDA0003190176840000144
and splicing to obtain a second splicing characteristic, and inputting the second splicing characteristic into the convolution layer to perform downsampling convolution processing to obtain a second convolution characteristic. The second convolution signature is associated with a third context signature in a third context and a fourth context
Figure BDA0003190176840000145
And
Figure BDA0003190176840000146
and splicing to obtain a third splicing characteristic, inputting the third splicing characteristic into the convolutional layer, and performing downsampling convolution processing to obtain a third convolution characteristic. Comparing the third convolution characteristic with a fourth context characteristic in the third context and the fourth context
Figure BDA0003190176840000147
And
Figure BDA0003190176840000148
and splicing to obtain a fourth splicing characteristic, and inputting the fourth splicing characteristic into the convolution layer for downsampling convolution processing to obtain a fourth convolution characteristic. Comparing the fourth convolution characteristic with the third context and a fifth context characteristic in the fourth context
Figure BDA0003190176840000149
And
Figure BDA00031901768400001410
splicing to obtain a fifth splicing characteristic, and outputting the fifth splicing characteristicAnd (5) putting the convolution layer into the convolution layer for up-sampling convolution processing to obtain a fifth convolution characteristic.
And further, splicing the fifth convolution characteristic and the third convolution characteristic to obtain a sixth splicing characteristic, and inputting the sixth splicing characteristic into the convolution layer to perform upsampling processing to obtain a sixth convolution characteristic. And splicing the sixth convolution characteristic and the second convolution characteristic to obtain a seventh splicing characteristic, and inputting the seventh splicing characteristic into the convolution layer for up-sampling processing to obtain a seventh convolution characteristic. And splicing the seventh convolution characteristic and the first convolution characteristic to obtain an eighth splicing characteristic, and inputting the eighth splicing characteristic into the convolution layer to perform upsampling processing to obtain a ninth convolution characteristic. And splicing the ninth convolution characteristic and the residual error characteristic to obtain a fusion characteristic.
In the method of determining the fusion feature shown in fig. 7, the number of convolution layers for performing the upsampling process is the same as that of convolution layers for performing the downsampling process, and the specific number is the same as that of the feature map in the third context feature or the fourth context feature.
In some feasible embodiments, when determining the fusion feature based on the third context feature, the fourth context feature and the residual feature, each feature map in the third context feature may be backward mapped based on the first optical flow field to obtain a fifth context feature, and each feature map in the fourth context feature may be backward mapped based on the second optical flow field to obtain a sixth context feature. Furthermore, the fusion feature is determined based on the fifth context feature, the sixth context feature and the residual feature, and the specific determination manner is the same as the implementation manner shown in fig. 7, and will not be described here.
As an example, referring to fig. 8, fig. 8 is a schematic diagram of another scenario for determining a contextual characteristic provided in an embodiment of the present application. As shown in fig. 8, after determining each feature map of the first reference frame based on the combination of each convolution layer and the activation parameter to obtain the third context feature, each feature map in the third context feature may be respectively mapped backward based on the first optical flow field to obtain a mapping feature map corresponding to each feature map, and then each mapping feature map may be determined as the fifth context feature.
Optionally, since the sizes of the feature maps in the third context feature of the first reference frame and the fourth context feature of the second reference frame are different, for each feature map in the third context feature, the optical flow field weight corresponding to the feature map may be determined, so as to determine, based on the optical flow field weight and the first optical flow field, a new optical flow field corresponding to when the feature map is subjected to backward mapping. And then, for each feature map in the third context feature, the feature map can be mapped backwards based on the new optical flow field corresponding to the feature map, so as to obtain a mapping feature map corresponding to the feature map. And determining a fifth context feature based on the mapping feature map corresponding to each feature map in the third context feature.
As an example, referring to fig. 9, fig. 9 is a schematic diagram of another scenario for determining a contextual feature provided in an embodiment of the present application. As shown in fig. 9, after the third context feature of the first reference frame is obtained, the optical flow field weights, such as 1, 0.5, 0.25, 0.125, 0.0625, and the like, corresponding to each feature map in the third context feature may be determined, and then new optical flow fields, such as optical flow field 1, optical flow field 2, optical flow field 3, optical flow field 4, and optical flow field 5, corresponding to each feature map may be determined based on the first optical flow field and the optical flow field weights. And then mapping each feature map backwards based on the new optical flow field corresponding to each feature map to obtain a mapping feature map corresponding to each feature map, and determining each mapping feature map as a fifth context feature.
Similarly, for each feature map in the fourth context feature, the optical flow field weight corresponding to the feature map may be determined, so as to determine a new optical flow field corresponding to backward mapping of the feature map based on the optical flow field weight and the second optical flow field. And then, for each feature map in the fourth context feature, the feature map may be mapped backwards based on the new optical flow field corresponding to the feature map, so as to obtain a mapping feature map corresponding to the feature map. And determining a sixth context feature based on the mapping feature map corresponding to each feature map in the fourth context feature.
The optical flow field weights corresponding to the feature maps in the third context feature and the fourth context feature may be specifically determined based on an actual application scenario, which is not limited herein.
In some possible embodiments, after determining the fused feature based on the reconstructed residual, the fused feature may be further processed to obtain the target feature. Specifically, as shown in fig. 10, fig. 10 is a scene schematic diagram for determining a reconstruction residual and an occlusion weight according to an embodiment of the present application. The fused features may be input to a convolution layer for further processing of the fused features and sub-pixel convolution of the processed results to obtain high resolution target features. And then determining the occlusion weight and the reconstruction residual error in the frame prediction process based on the target characteristics.
Specifically, when determining the occlusion weight and the reconstruction residual in the frame prediction process based on the target feature, the number of channels corresponding to the target feature and the feature value corresponding to each channel may be determined. And determining the characteristic value of the last channel as the shielding weight in the frame prediction process, and determining the reconstruction residual error in the frame prediction process based on the characteristic values of other channels. And e.g. splicing the corresponding characteristic values of other channels except the last channel to obtain a reconstructed residual error in the frame prediction process.
The occlusion weight and the reconstructed residual error in the frame prediction process are determined based on the first optical flow field, the second optical flow field, the first predicted frame, the second predicted frame, the third context feature and the fourth context feature, which are further described below with reference to fig. 11. FIG. 11 is a schematic diagram of another scene for determining occlusion weights and reconstructing residuals according to an embodiment of the present application. That is, the residual features are determined based on the first optical flow field, the second optical flow field, the first predicted frame, and the second predicted frame in the manner shown in fig. 6, and the fusion features are determined based on the residual features, the third context features, and each feature map in the fourth context features in the manner shown in fig. 7. In the manner shown in fig. 10, the reconstructed residual and the occlusion weight in the frame prediction process are determined based on the residual features.
In some possible embodiments, after determining the occlusion weight and the reconstructed residual in the frame prediction process, the target prediction frame corresponding to the set of associated reference frames may be determined based on the first prediction frame, the second prediction frame, the occlusion weight and the reconstructed residual. The specific determination method can be as follows:
Figure BDA0003190176840000171
wherein,
Figure BDA0003190176840000172
which represents the first predicted frame, is,
Figure BDA0003190176840000173
representing the second predicted frame, M representing the occlusion weight, a representing the reconstructed residual,
Figure BDA0003190176840000174
indicating a target predicted frame, an indicates a dot product operation.
And step S34, performing frame interpolation processing on the target predicted frames corresponding to each group of associated reference frames to obtain a second frame sequence, and obtaining a played video based on the second frame sequence.
In some possible embodiments, for each set of associated reference frames, a target prediction frame between a first reference frame and a second reference frame of the set of associated reference frames is predicted based on a target prediction frame corresponding to the associated reference frame. Based on the above, the target predicted frames corresponding to each group of associated reference frames can be subjected to frame insertion processing, and the target predicted frame corresponding to each associated reference frame is inserted between the first reference frame and the second reference frame of the associated reference frame, so as to obtain the second frame sequence based on the first frame sequence.
Further, after obtaining the second frame sequence, the first device may determine to play the video based on the second frame sequence, that is, the second frame sequence is a frame sequence corresponding to the video played by the first device.
A scene diagram of determining a target prediction frame according to the embodiment of the present application is provided below with reference to fig. 12. Fig. 12 is a schematic diagram of a scene of determining a target prediction frame according to an embodiment of the present application. As shown in fig. 12, a first optical flow field corresponding to a first reference frame and a second optical flow field corresponding to a second reference frame are determined by using a RAFT model, the first reference frame is backward mapped based on the first optical flow field to obtain a first predicted frame, and the second reference frame is backward mapped based on the second optical flow field to obtain a second predicted frame.
And respectively determining a third context feature corresponding to the first reference frame and a fourth context feature corresponding to the second reference frame through a context feature extraction network (ContextNet), carrying out backward mapping on each feature graph in the third context feature based on the first optical flow field to obtain a fifth context feature, and carrying out backward mapping on each feature graph in the fourth context feature based on the second optical flow field to obtain a sixth context feature.
And inputting the fifth context feature, the sixth context feature, the first optical flow field, the second optical flow field, the first predicted frame and the second predicted frame into the U-NET network to obtain a reconstructed residual error and an occlusion weight in the frame prediction process, and determining the target predicted frame based on the reconstructed residual error, the occlusion weight, the first predicted frame and the second predicted frame.
In the embodiment of the application, for each group of associated reference frames in a first frame sequence obtained by decoding, by determining a first predicted frame and a second predicted frame corresponding to a first reference frame and a second reference frame in each group of associated reference frames, a first optical flow field and a second optical flow field, and an occlusion weight and a reconstruction residual in a frame prediction process, occlusion information in the frame prediction process, detail information of each image frame, and optical flow field information can be fully considered, and problems of object jitter, edge blur and the like in the frame prediction process can be effectively solved, so that video definition is improved, and video viewing experience is improved.
Referring to fig. 13, fig. 13 is a schematic structural diagram of a video data processing apparatus according to an embodiment of the present application. The video data processing apparatus provided by the embodiment of the present application includes:
a brightness determining module 41, configured to determine an initial frame sequence of a video to be processed, and determine brightness of each pixel point of each image frame in the initial frame sequence;
a frame sequence determining module 42, configured to perform frame extraction processing on the initial frame sequence based on brightness of each pixel point of each image frame in the initial frame sequence, and use the frame sequence after frame extraction as a first frame sequence;
an encoding module 43, configured to encode the first frame sequence to obtain encoded data of the video to be processed, where the encoded data carries a reference frame number of each reference frame, and each reference frame is an image frame adjacent to a target frame sequence extracted in the frame extraction process;
a sending module 44, configured to send the encoded data to a first device, so that the first device determines a second frame sequence based on the encoded data and the reference frames corresponding to the reference frame numbers, and determines to play a video based on the second frame sequence.
In some possible embodiments, the frame sequence determining module 42 is configured to:
for any image frame in the initial frame sequence, determining the brightness difference between each pixel point of the image frame and the corresponding pixel point of the previous image frame, and determining the total brightness difference between the image frame and the previous image frame based on the brightness difference corresponding to each pixel point of the image frame;
determining image frames with the corresponding total brightness difference larger than a first threshold value in the initial frame sequence as active frames, and determining image frames with the corresponding total brightness difference smaller than or equal to the first threshold value as static frames;
and performing frame extraction processing on the initial frame sequence based on the active frame and the static frame.
In some possible embodiments, the frame sequence determining module 42 is configured to:
determining a continuous active frame sequence and a continuous static frame sequence in the initial frame sequence;
and performing frame extraction processing on the continuous active frame sequence and the continuous static frame sequence in the initial frame sequence.
In some possible embodiments, the frame sequence determining module 42 is configured to:
determining an initial frame rate corresponding to the video to be processed, and dividing the initial frame sequence into at least one subframe sequence based on the initial frame rate;
determining a continuous active frame sequence and a continuous static frame sequence in each subframe sequence;
for each continuous active frame sequence, if the number of active frames in the continuous active frame sequence is greater than a second threshold, performing frame extraction processing on the continuous active frame sequence;
for each of the consecutive still frame sequences, if the number of still frames in the consecutive still frame sequence is greater than a third threshold, performing frame decimation on the consecutive still frame sequence.
In a specific implementation, the video data processing apparatus may execute the implementation manners provided in the steps in fig. 1 through the built-in functional modules, which may specifically refer to the implementation manners provided in the steps, and are not described herein again.
Referring to fig. 14, fig. 14 is a schematic structural diagram of another video data processing apparatus according to an embodiment of the present application. The video data processing apparatus provided by the embodiment of the present application includes:
a decoding module 51, configured to acquire encoded data sent by a second device, and decode the encoded data to obtain a first frame sequence, where the first frame sequence is a frame sequence obtained by performing frame extraction processing on an initial frame sequence of a video to be processed by the second device;
a reference frame determining module 52, configured to determine, based on the reference frame sequence numbers carried in the encoded data, groups of associated reference frames in the reference frames corresponding to the first frame sequence, where each group of associated reference frames includes two reference frames adjacent to a target frame sequence extracted in the frame extraction process;
a frame prediction module 53, configured to determine, for each group of associated reference frames, a first predicted frame corresponding to the first reference frame and a second predicted frame corresponding to the second reference frame, and an occlusion weight and a reconstructed residual in a frame prediction process based on a first reference frame and a second reference frame in the group of associated reference frames, and determine a target predicted frame corresponding to the group of associated reference frames based on the first predicted frame, the second predicted frame, the occlusion weight, and the reconstructed residual;
and a video determining module 54, configured to perform frame interpolation on the target prediction frames corresponding to each group of associated reference frames to obtain a second frame sequence, and obtain a played video based on the second frame sequence.
In some possible embodiments, for each group of the associated reference frames, the frame prediction module 53 is configured to:
determining a first optical flow field corresponding to the first reference frame and a second optical flow field corresponding to the second reference frame based on a first reference frame and a second reference frame in the group of associated reference frames;
and performing backward mapping on the first reference frame based on the first optical flow field to obtain a first prediction frame corresponding to the first reference frame, and performing backward mapping on the second reference frame based on the second optical flow field to obtain a second prediction frame corresponding to the second reference frame.
In some possible embodiments, for each group of the associated reference frames, the frame prediction module 53 is configured to:
performing feature extraction on a first reference frame in the group of associated reference frames to obtain first initial features, performing feature extraction on a second reference frame in the group of associated reference frames to obtain second initial features, and determining associated features corresponding to the first reference frame and the second reference frame based on the first initial features and the second initial features;
determining a first context feature of the first reference frame, and determining a first optical flow field corresponding to the first reference frame based on the first context feature and the association feature;
and determining a second context feature of the second reference frame, and determining a second optical flow field corresponding to the second reference frame based on the second context feature and the association feature.
In some possible embodiments, for each group of the associated reference frames, the frame prediction module 53 is configured to:
determining a third contextual characteristic of the first reference frame and a fourth contextual characteristic of the second reference frame;
and determining the occlusion weight and the reconstruction residual in the frame prediction process based on the first optical flow field, the second optical flow field, the first prediction frame, the second prediction frame, the third context feature and the fourth context feature.
In some possible embodiments, the frame prediction module 53 is configured to:
determining residual features based on the first optical flow field, the second optical flow field, the first predicted frame, and the second predicted frame;
determining a fusion feature based on the third context feature, the fourth context feature, and the residual feature;
and determining the occlusion weight and the reconstruction residual error in the frame prediction process based on the fusion characteristics.
In some possible embodiments, the third contextual feature and the fourth contextual feature comprise a plurality of feature maps; the frame prediction module 53 is configured to:
determining an optical flow field weight corresponding to each feature map in the third context features, and mapping the feature map backwards based on the optical flow field weight corresponding to the feature map and the first optical flow field to obtain a mapping feature map corresponding to the feature map;
determining a mapping feature map corresponding to each feature map in the third context features as a fifth context feature of the first reference frame;
determining an optical flow field weight corresponding to each feature map in the fourth context feature, and performing backward mapping on the feature map based on the optical flow field weight corresponding to the feature map and the second optical flow field to obtain a mapping feature map corresponding to the feature map;
determining a mapping feature map corresponding to each feature map in the fourth context features as a sixth context feature of the second reference frame;
determining a fusion feature based on the fifth context feature, the sixth context feature, and the residual feature.
In some possible embodiments, the frame prediction module 53 is configured to:
performing feature processing on the fusion features to obtain target features, and determining the number of channels corresponding to the target features;
determining the characteristic value of the target characteristic corresponding to the last channel as an occlusion weight;
and determining a reconstruction residual error based on the characteristic values of the target characteristic corresponding to other channels.
In a specific implementation, the video data processing apparatus may execute the implementation manners provided in the steps in fig. 3 through the built-in functional modules, which may specifically refer to the implementation manners provided in the steps, and are not described herein again.
Referring to fig. 15, fig. 15 is a schematic structural diagram of an electronic device provided in an embodiment of the present application. As shown in fig. 15, the electronic device 1000 in the present embodiment may include: the processor 1001, the network interface 1004, and the memory 1005, and the electronic device 1000 may further include: a user interface 1003, and at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1004 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 15, a memory 1005, which is a kind of computer-readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a device control application program.
In the electronic device 1000 shown in fig. 15, the network interface 1004 may provide a network communication function; the user interface 1003 is an interface for providing a user with input; and the processor 1001 may be configured to invoke a device control application stored in the memory 1005 to implement the video data processing method performed by the first device and/or the second device.
It should be understood that in some possible embodiments, the processor 1001 may be a Central Processing Unit (CPU), and the processor may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), field-programmable gate arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The memory may include both read-only memory and random access memory, and provides instructions and data to the processor. The portion of memory may also include non-volatile random access memory. For example, the memory may also store device type information.
In a specific implementation, the electronic device 1000 may execute, through each built-in functional module thereof, the implementation manners provided in each step in fig. 2 and/or fig. 3, which may be referred to specifically for the implementation manners provided in each step, and are not described herein again.
An embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and is executed by a processor to implement the method provided in each step in fig. 2 and/or fig. 3, which may specifically refer to the implementation manner provided in each step, and is not described herein again.
The computer-readable storage medium may be the video data processing apparatus and/or an internal storage unit of an electronic device provided in any of the foregoing embodiments, for example, a hard disk or a memory of the electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) card, a flash card (flash card), and the like, which are provided on the electronic device. The computer readable storage medium may further include a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), and the like. Further, the computer readable storage medium may also include both an internal storage unit and an external storage device of the electronic device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the electronic device. The computer readable storage medium may also be used to temporarily store data that has been output or is to be output.
Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the methods provided by the steps of fig. 2 and/or fig. 3.
The terms "first", "second", and the like in the claims and in the description and drawings of the present application are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or electronic device that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or electronic device. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments. The term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not intended to limit the scope of the present application, which is defined by the appended claims.

Claims (15)

1. A method of video data processing, the method comprising:
determining an initial frame sequence of a video to be processed, and determining the brightness of each pixel point of each image frame in the initial frame sequence;
based on the brightness of each pixel point of each image frame in the initial frame sequence, performing frame extraction processing on the initial frame sequence, and taking the frame sequence after frame extraction as a first frame sequence;
coding the first frame sequence to obtain coded data of the video to be processed, wherein the coded data carry reference frame serial numbers of all reference frames, and each reference frame is an image frame adjacent to a target frame sequence extracted in the frame extraction process;
and sending the encoded data to a first device, so that the first device determines a second frame sequence based on the encoded data and the reference frames corresponding to the reference frame sequence numbers, and determines to play the video based on the second frame sequence.
2. The method of claim 1, wherein the decimating the initial frame sequence based on the intensities of the pixels of the image frames in the initial frame sequence comprises:
for any image frame in the initial frame sequence, determining the brightness difference between each pixel point of the image frame and the corresponding pixel point of the previous image frame, and determining the total brightness difference between the image frame and the previous image frame based on the brightness difference corresponding to each pixel point of the image frame;
determining image frames with corresponding total brightness difference larger than a first threshold value in the initial frame sequence as active frames, and determining image frames with corresponding total brightness difference smaller than or equal to the first threshold value as static frames;
and performing frame extraction processing on the initial frame sequence based on the active frame and the static frame.
3. The method of claim 2, wherein the decimating the initial frame sequence based on the active frame and the static frame comprises:
determining a continuous active frame sequence and a continuous static frame sequence in the initial frame sequence;
and performing frame extraction processing on the continuous active frame sequence and the continuous static frame sequence in the initial frame sequence.
4. The method of claim 3, wherein determining a sequence of consecutive active frames and a sequence of consecutive static frames in the initial frame sequence comprises:
determining an initial frame rate corresponding to the video to be processed, and dividing the initial frame sequence into at least one subframe sequence based on the initial frame rate;
determining a continuous active frame sequence and a continuous static frame sequence in each of the sub-frame sequences;
the frame-extracting processing of the continuous active frame sequence and the continuous static frame sequence in the initial frame sequence comprises:
for each continuous active frame sequence, if the number of active frames in the continuous active frame sequence is greater than a second threshold, performing frame extraction processing on the continuous active frame sequence;
and for each continuous static frame sequence, if the number of static frames in the continuous static frame sequence is greater than a third threshold value, performing frame extraction processing on the continuous static frame sequence.
5. A method of video data processing, the method comprising:
acquiring encoded data sent by second equipment, and decoding the encoded data to obtain a first frame sequence, wherein the first frame sequence is a frame sequence obtained by performing frame extraction processing on an initial frame sequence of a video to be processed by the second equipment;
determining each group of associated reference frames in the reference frames corresponding to the first frame sequence based on each reference frame sequence number carried by the encoded data, wherein each group of associated reference frames comprises two reference frames adjacent to a target frame sequence extracted in the frame extraction process;
for each group of associated reference frames, determining a first predicted frame corresponding to the first reference frame and a second predicted frame corresponding to the second reference frame, and an occlusion weight and a reconstruction residual in a frame prediction process based on a first reference frame and a second reference frame in the group of associated reference frames, and determining a target predicted frame corresponding to the group of associated reference frames based on the first predicted frame, the second predicted frame, the occlusion weight and the reconstruction residual;
and performing frame interpolation processing on the target prediction frames corresponding to the associated reference frames to obtain a second frame sequence, and obtaining a played video based on the second frame sequence.
6. The method of claim 5, wherein for each set of the associated reference frames, determining a first predicted frame corresponding to the first reference frame and a second predicted frame corresponding to the second reference frame based on the first reference frame and the second reference frame in the set of associated reference frames comprises:
determining a first optical flow field corresponding to the first reference frame and a second optical flow field corresponding to the second reference frame based on the first reference frame and the second reference frame in the group of associated reference frames;
and performing backward mapping on the first reference frame based on the first optical flow field to obtain a first prediction frame corresponding to the first reference frame, and performing backward mapping on the second reference frame based on the second optical flow field to obtain a second prediction frame corresponding to the second reference frame.
7. The method of claim 6, wherein for each group of the associated reference frames, the determining a first optical flow field corresponding to the first reference frame and a second optical flow field corresponding to the second reference frame based on a first reference frame and a second reference frame in the group of associated reference frames comprises:
performing feature extraction on a first reference frame in the group of associated reference frames to obtain first initial features, performing feature extraction on a second reference frame in the group of associated reference frames to obtain second initial features, and determining associated features corresponding to the first reference frame and the second reference frame based on the first initial features and the second initial features;
determining a first context feature of the first reference frame, and determining a first optical flow field corresponding to the first reference frame based on the first context feature and the association feature;
and determining a second context feature of the second reference frame, and determining a second optical flow field corresponding to the second reference frame based on the second context feature and the association feature.
8. The method of claim 6, wherein determining occlusion weights and reconstruction residuals in a frame prediction process based on a first reference frame and a second reference frame in each set of associated reference frames for each set of associated reference frames comprises:
determining a third contextual feature of the first reference frame, determining a fourth contextual feature of the second reference frame;
and determining occlusion weight and a reconstruction residual in a frame prediction process based on the first optical flow field, the second optical flow field, the first prediction frame, the second prediction frame, the third context feature and the fourth context feature.
9. The method of claim 8, wherein the determining occlusion weights and reconstructed residuals in a frame prediction process based on the first optical flow field, the second optical flow field, the first predicted frame, the second predicted frame, the third contextual feature, and the fourth contextual feature comprises:
determining residual features based on the first optical flow field, the second optical flow field, the first predicted frame, and the second predicted frame;
determining a fused feature based on the third context feature, the fourth context feature, and the residual feature;
and determining occlusion weight and reconstruction residual error in the frame prediction process based on the fusion characteristics.
10. The method of claim 9, wherein the third contextual feature and the fourth contextual feature comprise a plurality of feature maps; determining a fused feature based on the third context feature, the fourth context feature, and the residual feature, comprising:
determining an optical flow field weight corresponding to each feature map in the third context features, and mapping the feature map backwards based on the optical flow field weight corresponding to the feature map and the first optical flow field to obtain a mapping feature map corresponding to the feature map;
determining a mapping feature map corresponding to each feature map in the third context features as a fifth context feature of the first reference frame;
determining an optical flow field weight corresponding to each feature map in the fourth context features, and mapping the feature map backwards based on the optical flow field weight corresponding to the feature map and the second optical flow field to obtain a mapping feature map corresponding to the feature map;
determining a mapping feature map corresponding to each feature map in the fourth context features as a sixth context feature of the second reference frame;
determining a fused feature based on the fifth contextual feature, the sixth contextual feature, and the residual feature.
11. The method of claim 9, wherein determining occlusion weights and reconstruction residuals in a frame prediction process based on the fused features comprises:
performing feature processing on the fusion features to obtain target features, and determining the number of channels corresponding to the target features;
determining a feature value of the target feature corresponding to the last channel as an occlusion weight;
and determining a reconstruction residual error based on the characteristic values of the target characteristic corresponding to other channels.
12. A video data processing apparatus, characterized in that the apparatus comprises:
the system comprises a brightness determining module, a processing module and a processing module, wherein the brightness determining module is used for determining an initial frame sequence of a video to be processed and determining the brightness of each pixel point of each image frame in the initial frame sequence;
a frame sequence determining module, configured to perform frame extraction processing on the initial frame sequence based on brightness of each pixel point of each image frame in the initial frame sequence, and use the frame sequence after frame extraction as a first frame sequence;
the encoding module is used for encoding the first frame sequence to obtain encoded data of the video to be processed, the encoded data carries reference frame serial numbers of all reference frames, and each reference frame is an image frame adjacent to a target frame sequence extracted in the frame extraction process;
and the sending module is used for sending the coded data to first equipment so that the first equipment determines a second frame sequence based on the coded data and the reference frames corresponding to the reference frame serial numbers, and determines to play the video based on the second frame sequence.
13. A video data processing apparatus, characterized in that the apparatus comprises:
the decoding module is used for acquiring encoded data sent by second equipment and decoding the encoded data to obtain a first frame sequence, wherein the first frame sequence is a frame sequence obtained by performing frame extraction processing on an initial frame sequence of a video to be processed by the second equipment;
a reference frame determining module, configured to determine, based on each reference frame sequence number carried by the encoded data, each group of associated reference frames in the reference frames corresponding to the first frame sequence, where each group of associated reference frames includes two reference frames adjacent to a target frame sequence extracted in the frame extraction process;
the frame prediction module is used for determining a first prediction frame corresponding to the first reference frame and a second prediction frame corresponding to the second reference frame as well as an occlusion weight and a reconstruction residual in a frame prediction process based on the first reference frame and the second reference frame in each group of associated reference frames, and determining a target prediction frame corresponding to the group of associated reference frames based on the first prediction frame, the second prediction frame, the occlusion weight and the reconstruction residual;
and the video determining module is used for performing frame interpolation processing on the target prediction frames corresponding to the associated reference frames to obtain a second frame sequence, and obtaining a played video based on the second frame sequence.
14. An electronic device comprising a processor and a memory, the processor and the memory being interconnected;
the memory is used for storing a computer program;
the processor is configured to perform the method of any of claims 1 to 4 or the method of any of claims 5 to 11 when the computer program is invoked.
15. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which is executed by a processor to implement the method of any of claims 1 to 4 or the method of any of claims 5 to 11.
CN202110874693.7A 2021-07-30 2021-07-30 Video data processing method, device, equipment and storage medium Pending CN113556582A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110874693.7A CN113556582A (en) 2021-07-30 2021-07-30 Video data processing method, device, equipment and storage medium
PCT/CN2021/142584 WO2023005140A1 (en) 2021-07-30 2021-12-29 Video data processing method, apparatus, device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110874693.7A CN113556582A (en) 2021-07-30 2021-07-30 Video data processing method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113556582A true CN113556582A (en) 2021-10-26

Family

ID=78105050

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110874693.7A Pending CN113556582A (en) 2021-07-30 2021-07-30 Video data processing method, device, equipment and storage medium

Country Status (2)

Country Link
CN (1) CN113556582A (en)
WO (1) WO2023005140A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114286123A (en) * 2021-12-23 2022-04-05 海宁奕斯伟集成电路设计有限公司 Live broadcast method and device of television program
CN114449280A (en) * 2022-03-30 2022-05-06 浙江智慧视频安防创新中心有限公司 Video coding and decoding method, device and equipment
WO2023005140A1 (en) * 2021-07-30 2023-02-02 海宁奕斯伟集成电路设计有限公司 Video data processing method, apparatus, device, and storage medium
CN117315574A (en) * 2023-09-20 2023-12-29 北京卓视智通科技有限责任公司 Blind area track completion method, blind area track completion system, computer equipment and storage medium
WO2024067627A1 (en) * 2022-09-30 2024-04-04 中国电信股份有限公司 Machine vision-oriented video data processing method and related device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117729303A (en) * 2023-04-17 2024-03-19 书行科技(北京)有限公司 Video frame extraction method, device, computer equipment and medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060182179A1 (en) * 2005-02-14 2006-08-17 Samsung Electronics Co., Ltd. Video coding and decoding methods with hierarchical temporal filtering structure, and apparatus for the same
CN104618679A (en) * 2015-03-13 2015-05-13 南京知乎信息科技有限公司 Method for extracting key information frame from monitoring video
CN107027029A (en) * 2017-03-01 2017-08-08 四川大学 High-performance video coding improved method based on frame rate conversion
US20190124337A1 (en) * 2017-07-07 2019-04-25 Kakadu R & D Pty Ltd. Fast, high quality optical flow estimation from coded video
CN109905717A (en) * 2017-12-11 2019-06-18 四川大学 A kind of H.264/AVC Encoding Optimization based on Space-time domain down-sampling and reconstruction
CN111901598A (en) * 2020-06-28 2020-11-06 华南理工大学 Video decoding and encoding method, device, medium and electronic equipment
CN112104830A (en) * 2020-08-13 2020-12-18 北京迈格威科技有限公司 Video frame insertion method, model training method and corresponding device
CN112184779A (en) * 2020-09-17 2021-01-05 无锡安科迪智能技术有限公司 Method and device for processing interpolation image
CN112532998A (en) * 2020-12-01 2021-03-19 网易传媒科技(北京)有限公司 Method, device and equipment for extracting video frame and readable storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10694244B2 (en) * 2018-08-23 2020-06-23 Dish Network L.L.C. Automated transition classification for binge watching of content
CN112866799B (en) * 2020-12-31 2023-08-11 百果园技术(新加坡)有限公司 Video frame extraction processing method, device, equipment and medium
CN113038176B (en) * 2021-03-19 2022-12-13 北京字跳网络技术有限公司 Video frame extraction method and device and electronic equipment
CN113556582A (en) * 2021-07-30 2021-10-26 海宁奕斯伟集成电路设计有限公司 Video data processing method, device, equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060182179A1 (en) * 2005-02-14 2006-08-17 Samsung Electronics Co., Ltd. Video coding and decoding methods with hierarchical temporal filtering structure, and apparatus for the same
CN104618679A (en) * 2015-03-13 2015-05-13 南京知乎信息科技有限公司 Method for extracting key information frame from monitoring video
CN107027029A (en) * 2017-03-01 2017-08-08 四川大学 High-performance video coding improved method based on frame rate conversion
US20190124337A1 (en) * 2017-07-07 2019-04-25 Kakadu R & D Pty Ltd. Fast, high quality optical flow estimation from coded video
CN109905717A (en) * 2017-12-11 2019-06-18 四川大学 A kind of H.264/AVC Encoding Optimization based on Space-time domain down-sampling and reconstruction
CN111901598A (en) * 2020-06-28 2020-11-06 华南理工大学 Video decoding and encoding method, device, medium and electronic equipment
CN112104830A (en) * 2020-08-13 2020-12-18 北京迈格威科技有限公司 Video frame insertion method, model training method and corresponding device
CN112184779A (en) * 2020-09-17 2021-01-05 无锡安科迪智能技术有限公司 Method and device for processing interpolation image
CN112532998A (en) * 2020-12-01 2021-03-19 网易传媒科技(北京)有限公司 Method, device and equipment for extracting video frame and readable storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023005140A1 (en) * 2021-07-30 2023-02-02 海宁奕斯伟集成电路设计有限公司 Video data processing method, apparatus, device, and storage medium
CN114286123A (en) * 2021-12-23 2022-04-05 海宁奕斯伟集成电路设计有限公司 Live broadcast method and device of television program
CN114449280A (en) * 2022-03-30 2022-05-06 浙江智慧视频安防创新中心有限公司 Video coding and decoding method, device and equipment
CN114449280B (en) * 2022-03-30 2022-10-04 浙江智慧视频安防创新中心有限公司 Video coding and decoding method, device and equipment
WO2024067627A1 (en) * 2022-09-30 2024-04-04 中国电信股份有限公司 Machine vision-oriented video data processing method and related device
CN117315574A (en) * 2023-09-20 2023-12-29 北京卓视智通科技有限责任公司 Blind area track completion method, blind area track completion system, computer equipment and storage medium
CN117315574B (en) * 2023-09-20 2024-06-07 北京卓视智通科技有限责任公司 Blind area track completion method, blind area track completion system, computer equipment and storage medium

Also Published As

Publication number Publication date
WO2023005140A1 (en) 2023-02-02

Similar Documents

Publication Publication Date Title
CN113556582A (en) Video data processing method, device, equipment and storage medium
CN110324664B (en) Video frame supplementing method based on neural network and training method of model thereof
US10977809B2 (en) Detecting motion dragging artifacts for dynamic adjustment of frame rate conversion settings
CN111193923B (en) Video quality evaluation method and device, electronic equipment and computer storage medium
US10032261B2 (en) Methods, systems and apparatus for over-exposure correction
CN112598579B (en) Monitoring scene-oriented image super-resolution method, device and storage medium
CN110751649B (en) Video quality evaluation method and device, electronic equipment and storage medium
CN110136055B (en) Super resolution method and device for image, storage medium and electronic device
CN111681177B (en) Video processing method and device, computer readable storage medium and electronic equipment
CN110139147B (en) Video processing method, system, mobile terminal, server and storage medium
CN114339030B (en) Network live video image stabilizing method based on self-adaptive separable convolution
CN113688907A (en) Model training method, video processing method, device, equipment and storage medium
CN115293994B (en) Image processing method, image processing device, computer equipment and storage medium
Saha et al. Perceptual video quality assessment: The journey continues!
CN106603885B (en) Method of video image processing and device
Agarwal et al. Compressing video calls using synthetic talking heads
US20220335560A1 (en) Watermark-Based Image Reconstruction
Kim et al. Video display field communication: Practical design and performance analysis
US20100110287A1 (en) Method and apparatus for modeling film grain noise
CN117830077A (en) Image processing method and device and electronic equipment
CN115706810A (en) Video frame adjusting method and device, electronic equipment and storage medium
Lee et al. DCT‐Based HDR Exposure Fusion Using Multiexposed Image Sensors
CN115953597A (en) Image processing method, apparatus, device and medium
CN113613024A (en) Video preprocessing method and device
CN114693551B (en) Image processing method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 263, block B, science and technology innovation center, 128 Shuanglian Road, Haining Economic Development Zone, Haining City, Jiaxing City, Zhejiang Province, 314400

Applicant after: Haining yisiwei IC Design Co.,Ltd.

Applicant after: Beijing ESWIN Computing Technology Co.,Ltd.

Address before: Room 263, block B, science and technology innovation center, 128 Shuanglian Road, Haining Economic Development Zone, Haining City, Jiaxing City, Zhejiang Province, 314400

Applicant before: Haining yisiwei IC Design Co.,Ltd.

Applicant before: Beijing yisiwei Computing Technology Co.,Ltd.

CB02 Change of applicant information
RJ01 Rejection of invention patent application after publication

Application publication date: 20211026

RJ01 Rejection of invention patent application after publication