CN118678068A - Video encoding method, video encoding device, electronic equipment and storage medium - Google Patents
Video encoding method, video encoding device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN118678068A CN118678068A CN202310293622.7A CN202310293622A CN118678068A CN 118678068 A CN118678068 A CN 118678068A CN 202310293622 A CN202310293622 A CN 202310293622A CN 118678068 A CN118678068 A CN 118678068A
- Authority
- CN
- China
- Prior art keywords
- target
- video
- picture group
- coding
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 71
- 238000003860 storage Methods 0.000 title claims abstract description 29
- 238000012545 processing Methods 0.000 claims abstract description 64
- 239000012634 fragment Substances 0.000 claims abstract description 12
- 230000008859 change Effects 0.000 claims description 119
- 238000004590 computer program Methods 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 6
- 238000009432 framing Methods 0.000 claims description 4
- 238000005516 engineering process Methods 0.000 abstract description 15
- 230000000694 effects Effects 0.000 abstract description 9
- 238000012216 screening Methods 0.000 abstract description 9
- 238000010586 diagram Methods 0.000 description 25
- 230000001133 acceleration Effects 0.000 description 9
- 239000013598 vector Substances 0.000 description 9
- 230000006978 adaptation Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 238000003062 neural network model Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Landscapes
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The embodiment of the application discloses a video coding method, a device, electronic equipment and a storage medium, which are characterized in that coding performance parameters of each precoder are determined by performing constant code rate coding processing on target video fragments based on a plurality of precoders, and the target picture group structure is determined from a plurality of candidate picture group structures according to the coding performance parameters, so that the obtained target picture group structure is the picture group structure with the best rate distortion performance in the candidate picture group structures, the screening and matching effect of the picture group structure is achieved, the target picture group structure is sent to the target encoders for configuration, and then the target video fragments are coded, even when video coding is performed on target video fragments with different characteristics and different scenes, the target picture group structure can be selected in a self-adaptive mode, thereby effectively improving the rate distortion performance in video coding, and the method can be widely applied to the technical fields of cloud technology, video processing and the like.
Description
Technical Field
The present application relates to the field of video processing technologies, and in particular, to a video encoding method, apparatus, electronic device, and storage medium.
Background
With the development of digital media technology and computer technology, video is applied to various fields, such as mobile communication, network identification, network television and the like, and brings great convenience to entertainment and life of people. In the related art, when video encoding is performed on a video to be encoded, a fixed picture group structure is generally adopted, and when the content of the video to be encoded changes more severely, the rate distortion performance in video encoding is reduced.
Disclosure of Invention
The following is a summary of the subject matter of the detailed description of the application. This summary is not intended to limit the scope of the claims.
The embodiment of the application provides a video coding method, a video coding device, electronic equipment and a storage medium, which can improve the rate distortion performance during video coding.
In one aspect, an embodiment of the present application provides a video encoding method, including:
Obtaining a target video segment, and performing code rate constant coding processing on the target video segment based on a plurality of precoders to obtain sample coding data output by each precoder, wherein the precoders are configured with preset candidate picture group structures, and the candidate picture group structures of any two precoders are different;
Determining coding performance parameters of each precoder according to the sample coding data, determining a target picture group structure from a plurality of candidate picture group structures according to the coding performance parameters, and sending the target picture group structure to a target encoder for configuration;
and carrying out coding processing on the target video segment based on the target encoder to obtain target coding data of the target video segment.
On the other hand, the embodiment of the application also provides a video coding method, which comprises the following steps:
acquiring a target video segment and a target picture group structure, and sending the target picture group structure to a target encoder for configuration;
Encoding the target video segment based on the target encoder to obtain target encoding data of the target video segment;
The target picture group structure is based on a plurality of precoders to perform coding processing of constant code rate on the target video segment, sample coding data output by each precoder is obtained, coding performance parameters of each precoder are determined according to the sample coding data, the target picture group structure is determined from a plurality of candidate picture group structures according to the coding performance parameters, the precoders are configured with preset candidate picture group structures, and the candidate picture group structures of any two precoders are different.
On the other hand, the embodiment of the application also provides a video coding device, which comprises:
The first coding module is used for obtaining a target video segment, and carrying out code rate constant coding processing on the target video segment based on a plurality of precoders to obtain sample coding data output by each precoder, wherein the precoders are configured with preset candidate picture group structures, and the candidate picture group structures of any two precoders are different;
A first picture group structure configuration module, configured to determine coding performance parameters of each precoder according to the sample coding data, determine a target picture group structure from a plurality of candidate picture group structures according to the coding performance parameters, and send the target picture group structure to a target encoder for configuration;
And the second coding module is used for coding the target video segment based on the target encoder to obtain target coding data of the target video segment.
Further, the first encoding module is specifically configured to:
determining the first frame length of each candidate picture group structure, and calculating the target common multiple of a plurality of first frame lengths to obtain a second frame length;
And acquiring an original video segment, and extracting the target video segment from the original video segment according to the second frame length.
Further, the first encoding module is specifically configured to:
acquiring a video to be encoded, and determining a plurality of first scene change frames in the video to be encoded;
And dividing the video to be encoded into segments according to the first scene change frame to obtain a plurality of original video segments corresponding to different video scenes.
Further, the first encoding module is specifically configured to:
carrying out framing treatment on the video to be encoded to obtain a plurality of candidate video frames;
Traversing a plurality of candidate video frames, determining a first image similarity between the current candidate video frame and a first adjacent video frame and a second image similarity between the current candidate video frame and a second adjacent video frame, wherein the first adjacent video frame is a previous video frame adjacent to the current candidate video frame, and the second adjacent video frame is a next video frame adjacent to the current candidate video frame;
And when the difference value between the second image similarity and the first image similarity is larger than or equal to a preset difference value threshold, determining the current candidate video frame or the second adjacent video frame as the first scene change frame of the video to be encoded.
Further, the first group of pictures structure configuration module is further configured to:
When the target video segment is extracted from the original video segment according to the second frame length and a residual video segment smaller than the second frame length exists, determining the target picture group structure corresponding to the residual video segment according to the target picture group structure corresponding to the target video segment;
The second encoding module is further configured to:
And carrying out coding processing on the residual video segments based on the target encoder configured with the target picture group structure corresponding to the residual video segments.
Further, the first frame group structure configuration module is specifically configured to:
Determining the segment similarity between the residual video segments and each target video segment, and determining the target picture group structure corresponding to the residual video segments from the target picture group structures corresponding to a plurality of target video segments according to the segment similarity;
or determining a third frame length of the remaining video segments, and taking the target picture group structure of which the first frame length corresponding to the target video segments is smaller than or equal to the third frame length as the target picture group structure corresponding to the remaining video segments.
Further, the first encoding module is specifically configured to:
Determining a plurality of second scene change frames in the target video clip;
Determining scene change degree information of the target video segment according to a plurality of second scene change frames, wherein the scene change degree information is used for indicating intensity degree of scene change of the target video segment;
and determining a plurality of candidate picture group structures according to the scene change information, and respectively sending the plurality of candidate picture group structures to each precoder for configuration, wherein the first frame length of the candidate picture group structure is shortened along with the increase of the intensity indicated by the scene change degree information.
Further, the first encoding module is specifically configured to:
Determining the inter-frame distance between two adjacent second scene change frames, calculating the average value of a plurality of inter-frame distances to obtain the inter-frame average distance, and taking the inter-frame average distance as the scene change degree information of the target video segment;
Or determining the frame number of a plurality of second scene change frames, and taking the frame number as scene change degree information of the target video segment.
Further, the first frame group structure configuration module is specifically configured to:
When each precoder is configured with the same target code rate, determining a peak signal-to-noise ratio of each precoder according to the sample coding data, using the peak signal-to-noise ratio as the coding performance parameter, comparing the coding performance parameters corresponding to each precoder, and determining a target picture group structure from a plurality of candidate picture group structures according to the comparison result of the coding performance parameters;
Or when each precoder is configured with different target code rates, determining peak signal-to-noise ratios of each precoder according to the sample coding data, determining the number of code stream bits of each precoder according to the sample coding data, determining target weights of the number of code stream bits, obtaining weighted bit numbers according to products of the target weights and the number of code stream bits, obtaining the coding performance parameters according to the sum of the peak signal-to-noise ratios and the weighted bit numbers, comparing the coding performance parameters corresponding to each precoder, and determining a target picture group structure from a plurality of candidate picture group structures according to comparison results of the coding performance parameters.
Further, the first frame group structure configuration module is specifically configured to:
decoding the sample coded data to obtain a reference video segment;
Determining a first pixel value of an image in the target video segment and a second pixel value of an image in the reference video segment;
Calculating a mean square value between the first pixel value and the corresponding second pixel value;
And determining the pixel bit number of the image in the target video segment, and determining the peak signal-to-noise ratio of each precoder according to the mean square value and the pixel bit number.
Further, the number of the target video clips is plural, and the second encoding module is specifically configured to:
Generating segment identifiers corresponding to the target video segments, and marking the corresponding target picture group structures according to the segment identifiers;
And determining the current target picture group structure corresponding to the target video segment according to the segment identification based on the target encoder, and carrying out encoding processing on the current target video segment according to the corresponding target picture group structure to obtain target encoding data of the target video segment.
Further, the first encoding module is specifically configured to:
performing downsampling processing on the target video segment to obtain a downsampled video segment;
and carrying out constant code rate coding processing on the downsampled video segments based on a plurality of precoders to obtain sample coding data output by each precoder.
On the other hand, the embodiment of the application also provides a video coding device, which comprises:
The second picture group structure configuration module is used for acquiring a target video fragment and a target picture group structure and sending the target picture group structure to a target encoder for configuration;
The third coding module is used for coding the target video segment based on the target encoder to obtain target coding data of the target video segment;
The target picture group structure is based on a plurality of precoders to perform coding processing of constant code rate on the target video segment, sample coding data output by each precoder is obtained, coding performance parameters of each precoder are determined according to the sample coding data, the target picture group structure is determined from a plurality of candidate picture group structures according to the coding performance parameters, the precoders are configured with preset candidate picture group structures, and the candidate picture group structures of any two precoders are different.
On the other hand, the embodiment of the application also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the video coding method when executing the computer program.
In another aspect, an embodiment of the present application further provides a computer readable storage medium, where a computer program is stored, where the computer program is executed by a processor to implement the video encoding method described above.
In another aspect, embodiments of the present application also provide a computer program product comprising a computer program stored in a computer readable storage medium. The processor of the computer device reads the computer program from the computer-readable storage medium, and the processor executes the computer program so that the computer device performs the video encoding method described above.
The embodiment of the application at least comprises the following beneficial effects: the target video segment is obtained, code rate constant coding processing is carried out on the target video segment based on a plurality of precoders, sample coding data output by each precoder is obtained, coding performance parameters of each precoder are determined according to the sample coding data, a target picture group structure is determined from a plurality of candidate picture group structures according to the coding performance parameters, the obtained target picture group structure is a picture group structure with the best rate distortion performance in the candidate picture group structures, so that the screening matching effect of the picture group structure is achieved, the target picture group structure is further sent to the target encoder for configuration, and then the target video segment is subjected to coding processing based on the target encoder, so that target coding data of the target video segment are obtained.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.
Drawings
The accompanying drawings are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate and do not limit the application.
FIG. 1 is a schematic illustration of an alternative implementation environment provided by an embodiment of the present application;
FIG. 2 is a schematic illustration of an alternative implementation environment provided by an embodiment of the present application;
fig. 3 is an alternative flowchart of a video encoding method according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an alternative candidate frame group structure according to an embodiment of the present application;
FIG. 5 is a schematic diagram of another alternative candidate frame group structure according to an embodiment of the present application;
fig. 6 is a schematic diagram of segment division when encoding a video to be encoded according to an embodiment of the present application;
FIG. 7 is a schematic diagram of an alternative configuration of a plurality of candidate video frames according to an embodiment of the present application;
Fig. 8 is a flowchart of another alternative video encoding method according to an embodiment of the present application;
Fig. 9 is an optional complete flowchart of a video encoding method according to an embodiment of the present application;
fig. 10 is a schematic diagram of another alternative complete flow of a video encoding method according to an embodiment of the present application;
fig. 11 is a schematic diagram of another alternative complete flow of the video encoding method according to the embodiment of the present application;
fig. 12 is a schematic diagram of another alternative complete flow of the video encoding method according to the embodiment of the present application;
fig. 13 is a schematic structural diagram of an alternative first video encoding device according to an embodiment of the present application;
fig. 14 is a schematic structural diagram of an alternative second video encoding device according to an embodiment of the present application;
Fig. 15 is a partial block diagram of a terminal according to an embodiment of the present application;
Fig. 16 is a partial block diagram of a server according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
In the embodiments of the present application, when related processing is performed according to data related to characteristics of a target object, such as attribute information or attribute information set of the target object, permission or consent of the target object is obtained first, and the collection, use and processing of the data comply with relevant laws and regulations and standards of relevant countries and regions. Wherein the target object may be a user. In addition, when the embodiment of the application needs to acquire the attribute information of the target object, the independent permission or independent consent of the target object is acquired through a popup window or a jump to a confirmation page or the like, and after the independent permission or independent consent of the target object is explicitly acquired, the necessary target object related data for enabling the embodiment of the application to normally operate is acquired.
In order to facilitate understanding of the technical solution provided by the embodiments of the present application, some key terms used in the embodiments of the present application are explained here:
Cloud technology (Cloud technology) refers to a hosting technology for integrating hardware, software, network and other series resources in a wide area network or a local area network to realize calculation, storage, processing and sharing of data. The cloud technology is based on the general names of network technology, information technology, integration technology, management platform technology, application technology and the like applied by the cloud computing business mode, can form a resource pool, and is flexible and convenient as required. Cloud computing technology will become an important support. Background services of technical networking systems require a large amount of computing, storage resources, such as video websites, picture-like websites, and more portals. Along with the high development and application of the internet industry, each article possibly has an own identification mark in the future, the identification mark needs to be transmitted to a background system for logic processing, data with different levels can be processed separately, and various industry data needs strong system rear shield support and can be realized only through cloud computing.
Group of pictures: also referred to as GOP (Group Of Pictures), is the basic unit of the encoder encoding a video sequence. A coded video sequence can be seen as a combination of successive GOP codes. The GOP structure contains the number of frames (length) contained in the GOP and the reference relationship between frames.
P frame: PREDICTIVE PICTURE (predictive coded image frame), which is a type of video frame, requires reliance on either the preceding or following frame for encoding and decoding.
B frame: bidirectionally Predicted Picture (bi-directionally predictive coded image frames), which is a type of video frame, need to rely on both the preceding and following frames to be encoded and decoded. And generally has a higher compression rate than P frames.
CBR: constant Bit Rate, a rate control scheme, by which the output bit rate of the encoder remains substantially constant over a time frame.
In the current mainstream encoder implementation, video coding is performed in units of picture groups. The group of pictures structure specifies the frame type of the frames in the group of pictures and the reference relationship of the frames. There is a considerable degree of freedom in the design of the group of pictures, as long as the group of pictures that conforms to the coding standard syntax can be used. Typically, the group of pictures structure, once determined, will apply to all frames in one video to be encoded.
But different group of pictures structures may bring about different rate-distortion performance. For different videos to be encoded or different video clips in the same video to be encoded, the picture group structure with the optimal rate distortion performance may not be consistent. Coding the same video to be coded using different picture group structures may exhibit more than 10% rate distortion performance differences.
In the related art, when video encoding is performed on a video to be encoded, a fixed picture group structure is generally adopted, for example, whether to restart a picture group is generally determined according to a lens detection result, the length of the picture group is not modified, and when the content of the video to be encoded changes severely, the rate distortion performance in video encoding is reduced.
Based on the above, the embodiment of the application provides a video coding method, a video coding device, an electronic device and a storage medium, which can improve the rate distortion performance during video coding.
Referring to fig. 1, fig. 1 is a schematic diagram of an alternative implementation environment provided in an embodiment of the present application, where the implementation environment includes a terminal 101 or a server 102.
For example, after the target video segment is acquired, the terminal 101 or the server 102 may be provided with a target encoder and a plurality of precoders, and perform code rate constant encoding processing on the target video segment based on the plurality of precoders to obtain sample encoded data output by each precoder, where the precoders are configured with preset candidate picture group structures, the candidate picture group structures of any two precoders are different, determine the encoding performance parameters of each precoder according to the sample encoded data, determine the target picture group structure from the plurality of candidate picture group structures according to the encoding performance parameters, send the target picture group structure to the target encoder for configuration, and perform encoding processing on the target video segment based on the target encoder to obtain the target encoded data of the target video segment.
In addition, referring to fig. 2, fig. 2 is a schematic diagram of another alternative implementation environment provided in an embodiment of the present application, where the implementation environment includes a terminal 101 and a server 102, where the terminal 101 and the server 102 are connected through a communication network.
Illustratively, the terminal 101 may be provided with a target encoder, the server 102 may be provided with a plurality of precoders, the server 102 may obtain a target video segment sent by the terminal, and perform coding processing with a constant code rate on the target video segment based on the plurality of precoders, so as to obtain sample coded data output by each precoder; determining coding performance parameters of each precoder according to the sample coding data, determining a target picture group structure from a plurality of candidate picture group structures according to the coding performance parameters, transmitting the target picture group structure to the terminal 101, receiving the target picture group structure by the terminal 101, configuring the target picture group structure to the target encoder, and performing coding processing on the target video segment based on the target encoder to obtain target coding data of the target video segment.
Based on the implementation environment shown in fig. 1 or fig. 2, the target video segment is obtained, the code rate constant coding processing is performed on the target video segment based on the plurality of precoders, the sample coding data output by each precoder is obtained, then the coding performance parameters of each precoder are determined according to the sample coding data, the target picture group structure is determined from the plurality of candidate picture group structures according to the coding performance parameters, the target picture group structure obtained at the moment is the picture group structure with the best rate distortion performance in the plurality of candidate picture group structures, so that the screening matching effect of the picture group structure is achieved, the target picture group structure is further sent to the target encoder for configuration, and then the target video segment is coded based on the target encoder, so that the target coding data of the target video segment is obtained.
The server 102 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligent platforms, and the like. In addition, server 102 may also be a node server in a blockchain network.
The terminal 101 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a vehicle-mounted terminal, etc. The terminal 101 and the server 102 may be directly or indirectly connected through wired or wireless communication, and embodiments of the present application are not limited herein.
The method provided by the embodiment of the application can be applied to various technical fields including but not limited to the technical fields of cloud technology, video processing and the like.
Referring to fig. 3, fig. 3 is an optional flowchart of a video encoding method according to an embodiment of the present application, where the video encoding method may be performed by a terminal, or may be performed by a server, or may be performed by a terminal and a server in cooperation, and the video encoding method includes, but is not limited to, the following steps 301 to 303.
Step 301: and obtaining a target video segment, and carrying out code rate constant coding processing on the target video segment based on a plurality of precoders to obtain sample coding data output by each precoder.
The target video clip is a video clip to be encoded, and in one possible implementation, the target video clip may be a video clip stored locally by the terminal when the terminal performs video encoding, or may be a video clip collected by the terminal and sent to the server when the server performs video encoding.
The precoder is a preconfigured encoder, and is configured to perform precoding on the target video segment, that is, the encoding result obtained by the precoder is not a formal encoding result, the number of the precoders may be multiple, the precoders are configured with preset candidate picture group structures, and the candidate picture group structures of any two precoders are different. The candidate picture group structure is a picture group structure pre-configured in the precoder, and the number and specific structure of the candidate picture group structure can be set according to the actual coding requirement.
For example, the number of candidate group structures may be two, referring to fig. 4, fig. 4 is an optional schematic diagram of a candidate group structure provided in an embodiment of the present application, where the frame length of the candidate group structure shown in fig. 4 is 8, the structure is divided into three layers, a-g frames are B frames, h frames are P frames, and θ frames are video frames of the previous group structure. Frames in the same picture group structure as the h frames cannot be h-referenced, and the h frames can only reference θ frames belonging to the previous picture group structure. In encoding, the encoder encodes from the lowest layer of the group of pictures structure layer by layer upwards. In addition, referring to fig. 5, fig. 5 is another alternative schematic diagram of a candidate frame group structure according to an embodiment of the present application, which is similar to the candidate frame group structure shown in fig. 4, except that the frame length of the candidate frame group structure shown in fig. 5 is 16, the structure is divided into four layers, a-o frames are B frames, P frames are P frames, and θ frames are video frames of the previous frame group structure.
In one possible implementation, the precoder encodes the target video segment with a constant code rate, e.g., CBR mode may be used to encode the target video segment. By adopting a constant code rate to encode the target video segment, the bit number consumed by encoding the whole target video segment is basically consistent, a feasible foundation is provided for determining the target picture group structure from a plurality of candidate picture group structures according to the encoding performance parameters, and the accuracy and the reliability of the target picture group structure are higher.
In one possible implementation manner, when performing code rate constant encoding processing on a target video segment based on a plurality of precoders to obtain sample encoded data output by each precoder, downsampling processing may be performed on the target video segment to obtain a downsampled video segment. And carrying out constant code rate coding processing on the downsampled video segments based on the plurality of precoders to obtain sample coded data output by each precoder.
The downsampling process is performed on the target video segment, that is, the image size of the target video segment is reduced, so as to reduce the resolution of the target video segment. For example, the downsampling may be performed in a proportion of each half of the length and width, and of course, the downsampling may be performed in other proportions, which is not limited by the embodiment of the present application.
The target video segment is subjected to downsampling and then is encoded by the precoder, so that the encoding speed of the precoding can be effectively improved, and the final encoding result of the target video segment is not affected even if the target video segment is downsampled due to the fact that the precoder is used for precoding the target video segment.
Step 302: and determining coding performance parameters of each precoder according to the sample coding data, determining a target picture group structure from a plurality of candidate picture group structures according to the coding performance parameters, and sending the target picture group structure to the target encoder for configuration.
The coding performance parameter is used for indicating the adaptation degree of the precoder and the target video segment, namely, the rate distortion performance after the target video segment is coded by the precoder. The sample coding data is the data of each frame of the target video segment output by the precoder after being coded, and the coding performance parameters of each precoder are determined according to the sample coding data, which can be obtained by comparing the sample coding data with the original target video segment, and then determining the error generated between the sample coding data and the original target video segment after the sample coding data is coded by the precoder, wherein the error can be used as the coding performance parameters of the precoder.
The target encoder is an encoder for performing formal encoding on the target video segment, namely, the encoding structure output by the target encoder is the final encoding data of the target video segment. In some application scenarios, the target encoder may be deployed in the same device (or the same application) as the precoder, e.g., the target encoder and precoder may both be deployed on the terminal, or both on the server. Or in other application scenarios, the target encoder may be deployed on a different device than the precoder, e.g., the target encoder may be deployed on a terminal and the precoder may be deployed on a server.
The target video segment is obtained, code rate constant coding processing is carried out on the target video segment based on a plurality of precoders, sample coding data output by each precoder is obtained, coding performance parameters of each precoder are determined according to the sample coding data, a target picture group structure is determined from a plurality of candidate picture group structures according to the coding performance parameters, and the obtained target picture group structure is a picture group structure with the best rate distortion performance in the plurality of candidate picture group structures, so that the screening and matching effects of the picture group structures are achieved.
In one possible implementation manner, when determining the coding performance parameter of each precoder according to the sample coding data and determining the target picture group structure according to the coding performance parameter from the plurality of candidate picture group structures, the coding performance parameter of each precoder may be determined according to the sample coding data, and the candidate picture group structure with the best coding performance parameter may be selected from the plurality of candidate picture group structures as the target picture group structure.
When each precoder is configured with the same target code rate, a peak signal-to-Noise Ratio (PSNR) of each precoder may be determined according to the sample encoded data, the peak signal-to-Noise Ratio is used as a coding performance parameter, the coding performance parameters corresponding to each precoder are compared, and a target picture group structure is determined from a plurality of candidate picture group structures according to a comparison result of the coding performance parameters.
The different precoders can all be configured with the same target code rate, and because the precoders encode the target video segment by adopting the constant code rate, the bit numbers of the code streams output by the different precoders can be the same under the condition, so that the peak signal-to-noise ratio can be directly used as the coding performance parameter for comparison, and the candidate picture group structure of the precoder with the highest peak signal-to-noise ratio is used as the target picture group structure. By configuring the same target code rate in different precoders, parameters to be compared can be reduced when determining the target picture group structure, and the determination efficiency of the target picture group structure can be improved.
In addition, when each precoder is configured with different target code rates, peak signal-to-noise ratios of each precoder are determined according to the sample coding data, the number of code stream bits of each precoder is determined according to the sample coding data, target weights of the number of code stream bits are determined, the weighted number of bits is obtained according to the product of the target weights and the number of code stream bits, coding performance parameters are obtained according to the sum of the peak signal-to-noise ratios and the weighted number of bits, the coding performance parameters corresponding to each precoder are compared, and a target picture group structure is determined from a plurality of candidate picture group structures according to the comparison result of the coding performance parameters.
The different precoders can also be configured with different target code rates, so that the bit numbers of the code streams output by the different precoders are different, and therefore, the coding performance parameters can be obtained through the sum of the peak signal-to-noise ratio and the bit numbers of the code streams. Specifically, the target weight is used for eliminating the dimension difference between the peak signal-to-noise ratio and the bit number of the code stream, and function fitting can be performed through a plurality of groups of sample values of the peak signal-to-noise ratio and the bit number of the code stream, so that the target weight is determined, and then the coding performance parameters are obtained according to the sum of the peak signal-to-noise ratio and the weighted bit number for comparison, so that the comprehensiveness of the coding performance parameters can be improved, and meanwhile, the parameter limit on the precoder is reduced.
In one possible implementation manner, when determining the peak signal-to-noise ratio of each precoder according to sample encoded data, decoding the sample encoded data to obtain a reference video segment; determining a first pixel value of an image in the target video segment and a second pixel value of an image in the reference video segment; calculating a mean square value between the first pixel value and the corresponding second pixel value; the number of pixel bits of the image in the target video segment is determined, and the peak signal-to-noise ratio of each precoder is determined based on the mean square value and the number of pixel bits.
Specifically, the peak signal-to-noise ratio can be expressed as:
Wherein PSNR represents the peak signal-to-noise ratio, MSE represents the mean square value between a first pixel value and a corresponding second pixel value, and n is the number of pixel bits.
The first pixel value is the image pixel value of each video frame of the target video segment, the second pixel value is the image pixel value of each video frame of the reference video segment, and the sample coding data is the data of each frame of the target video segment output by the precoder after being coded, so that the reference video segment obtained by decoding the sample coding data actually obtains the restored target video segment, and then the restored target video segment is further compared with the pixel value of the image of the corresponding frame in the original target video segment, so that the peak signal-to-noise ratio can be obtained. The peak signal-to-noise ratio of each precoder is determined by calculating the mean square value between the first pixel value and the corresponding second pixel value and combining the pixel bit number, so that the sample coding data can be finely compared with the original target video segment, the accuracy of determining the coding performance parameters is improved, and the accuracy of subsequently determining the target picture group structure is improved.
Wherein, each video frame in the target video segment corresponds to a peak signal-to-noise ratio, so the peak signal-to-noise ratio of each precoder may be an average value of the peak signal-to-noise ratios corresponding to each video frame.
Step 303: and encoding the target video segment based on the target encoder to obtain target encoded data of the target video segment.
After the target picture group structure is sent to the target encoder for configuration, the target video segment can be input to the target encoder for encoding processing, the encoding mode of the target encoder can be set according to the actual encoding requirement of the target video segment, the target video segment is encoded based on the target encoder by configuring the target picture group structure to the target encoder, and the target encoding data of the target video segment is obtained, because the target picture group structure can be flexibly screened and matched based on the pre-encoder, even when the target video segment with different characteristics and different scenes is subjected to video encoding, the proper target picture group structure can be adaptively selected, so that the rate distortion performance in video encoding can be effectively improved.
In one possible implementation, it may be desirable to encode for a longer piece of original video, where the target video piece may be a portion of the original video piece. Therefore, when the original video segment is encoded, the original video segment can be divided to obtain a plurality of target video segments, and then the target video segments are encoded.
Based on the above, when the target video segment is obtained, the first frame length of each candidate picture group structure can be determined, and the target common multiple of a plurality of first frame lengths is calculated to obtain the second frame length; and acquiring an original video segment, and extracting a target video segment from the original video segment according to the second frame length.
The first frame length, that is, the number of video frames included in the candidate frame group structure, may be the smallest common multiple, and since the target video segments are respectively input to the precoder for encoding, if the frame length of the target video segments is too short, a part of the precoder cannot encode a plurality of complete frame group structures. The target common multiple of the first frame lengths is calculated to obtain a second frame length, and then the target video segment is extracted from the original video segment according to the second frame length, so that the reasonability of the frame length of the target video segment can be improved, and the precoder can encode a plurality of complete picture group structures.
For example, the first frame length of the candidate group structure shown in fig. 4 is 8, and the first frame length of the candidate group structure shown in fig. 5 is 16, and then the second frame length is 16, and at this time, several target video segments with frame lengths of 16 are extracted from the original video segments for encoding. For another example, the first frame length of the two candidate frame group structures is 8 and 20, respectively, and the second frame length is 20, and at this time, several target video segments with frame lengths of 40 are extracted from the original video segments for encoding.
In one possible implementation manner, the original video segment is one of the segments that may be a complete video to be encoded, based on which, when the original video segment is acquired, the video to be encoded may be acquired specifically, and a plurality of first scene change frames in the video to be encoded are determined; and dividing the video to be coded according to the first scene change frame to obtain a plurality of original video fragments corresponding to different video scenes.
Specifically, the first scene change frame is used to distinguish original video segments of different scenes, and assuming that the scenes of two adjacent original video segments are different, the last video frame of the first original video segment may be used as the first scene change frame, or the first video frame of the second original video segment may be used as the first scene change frame. The video to be encoded is divided according to the first scene change frame to obtain a plurality of original video clips corresponding to different video scenes, so that different candidate picture group structures can be conveniently allocated to the original video clips of different scenes, the adaptation degree between the precoder and the original video clips is improved, and the rate distortion performance of video encoding is effectively improved.
In one possible implementation manner, after dividing to obtain a plurality of original video clips corresponding to different video scenes, any one video frame can be obtained from the original video clips and input into a pre-trained neural network model, the neural network model firstly extracts image feature vectors of the video frames, then maps the image feature vectors through a full connection layer to obtain mapped feature vectors, and then classifies the mapped feature vectors to obtain classified vectors, wherein vector elements of the classified vectors correspond to selection probabilities of different candidate picture group structures, and finally TopN candidate picture group structures are selected according to the number of precoders. In addition, the neural network model can be obtained through supervised training of sample image frames and sample picture group structures.
Based on this, referring to fig. 6, fig. 6 is a schematic diagram of segment division when a video to be encoded is encoded according to an embodiment of the present application, after the video to be encoded is obtained, the video to be encoded may be segmented according to a first scene change frame to obtain a plurality of original video segments corresponding to different video scenes, then, each original video segment is further divided, a second frame length is obtained according to a least common multiple of frame lengths of each candidate frame group structure, a target video segment is extracted from the original video segment according to the second frame length, and the target video segment is a minimum unit encoded by a precoder.
In one possible implementation manner, when determining a plurality of first scene change frames in the video to be encoded, framing the video to be encoded to obtain a plurality of candidate video frames; traversing the plurality of candidate video frames, determining a first image similarity between the current candidate video frame and a first adjacent video frame, and a second image similarity between the current candidate video frame and a second adjacent video frame, and determining the current candidate video frame or the second adjacent video frame as a first scene change frame of the video to be encoded when a difference between the second image similarity and the first image similarity is greater than or equal to a preset difference threshold.
Wherein the first adjacent video frame is a previous video frame adjacent to the current candidate video frame and the second adjacent video frame is a next video frame adjacent to the current candidate video frame.
In one possible implementation manner, the first image similarity between the current candidate video frame and the first adjacent video frame is determined, specifically, the current candidate video frame and the first adjacent video frame are input into a pre-trained image feature extraction model, image feature vectors of the current candidate video frame and the first adjacent video frame are respectively extracted, the first image similarity is calculated according to the image feature vectors, or the current candidate video frame and the first adjacent video frame are subjected to graying processing, an average value of pixel values of all pixel points after the graying processing is calculated, the magnitude relation between the pixel values of all pixel points and the average value of the pixel values is compared, if the pixel values of the pixel points are larger than the average value of the pixel values, 1 is set, otherwise 0 is set, so that image feature information of the current candidate video frame and the first adjacent video frame is obtained, and the first image similarity is calculated according to the image feature information. It can be appreciated that the principle of calculating the second image similarity is similar to that of the first image similarity, and will not be described herein.
When the difference between the second image similarity and the first image similarity is greater than or equal to a preset difference threshold, it indicates that the image difference between the current candidate video frame and the second adjacent video frame is greater, it may be determined that the second adjacent video frame is subjected to scene switching compared with the current candidate video frame, and further the current candidate video frame may be determined as a first scene change frame of the video to be encoded, it may be understood that the second adjacent video frame may also be determined as the first scene change frame of the video to be encoded, and the difference threshold may be set according to practical situations. The first scene change frame is determined by determining the difference value between the second image similarity and the first image similarity, so that a comparison effect can be achieved, and compared with the case that the first scene change frame is determined directly according to the second image similarity, the determination error of the first scene change frame caused by the difference of the setting of the difference value threshold can be effectively reduced, and the accuracy of determining the first scene change frame is improved.
For example, assuming that the plurality of candidate video frames are video frame f1, video frame f2, video frame f3, video frame f4, and video frame f5, traversing the plurality of candidate video frames, first, the current candidate video frame is video frame f1, which has no first adjacent video frame, and is not compared; then, the current candidate video frame is a video frame f2, a first adjacent video frame is a video frame f1, a second adjacent video frame is a video frame f3, at this time, a first image similarity between the video frame f2 and the video frame f1 and a second image similarity between the video frame f2 and the video frame f3 are calculated, a difference value between the second image similarity and the first image similarity is further determined, and if the difference value is greater than or equal to a preset difference threshold value, the video frame f2 is a first scene change frame; then, judging that the current candidate video frame is a video frame f3 and the current candidate video frame is a video frame f4 by using the same principle; finally, the current candidate video frame is video frame f5, which has no second neighboring video frame, and is not compared. Thus, the process of determining the first scene change frame from the plurality of candidate video frames is completed.
In one possible implementation manner, the number of target video clips extracted from the original video clips may be multiple, different target video clips may correspond to different target picture group structures, and when the target video clips are encoded based on the target encoder to obtain target encoded data of the target video clips, a clip identifier corresponding to each target video clip may be specifically generated, and the target picture group structures corresponding to the clip identifier are marked according to the clip identifier; and determining a target picture group structure corresponding to the current target video segment according to the segment identification based on the target encoder, and encoding the current target video segment according to the corresponding target picture group structure to obtain target encoding data of the target video segment.
In this scenario, multiple target group of pictures structures may be configured into a target encoder, which may support switching different target group of pictures structures to encode the different target group of pictures structures. Specifically, after determining the target picture group structure corresponding to each target video clip according to the coding performance parameter, a data pair composed of the clip identifier and the target picture group structure may be generated, when the target picture group structure is coded, the corresponding target picture group structure may be matched according to the current target picture group structure to be coded and the data pair, and then the current target picture group structure may be coded by switching to the corresponding target picture group structure.
By introducing the segment identification, the coding processing of the plurality of target video segments can be realized more coherently and accurately under the condition of inputting the plurality of target video segments at one time.
For example, the terminal may generate segment identifiers of multiple target video segments, the server receives multiple target video segments and corresponding segment identifiers sent by the terminal, after determining target frame groups of each target video segment according to coding performance parameters, the server generates data pairs composed of each segment identifier and a target frame group structure, and returns the data pairs to the terminal.
In one possible implementation manner, when the target video segment is extracted from the original video segment according to the second frame length and then there is a remaining video segment smaller than the second frame length, the target picture group structure corresponding to the remaining video segment may be determined from the target picture group structures corresponding to the plurality of target video segments, and the encoding process may be performed on the remaining video segment based on the target encoder configured with the target picture group structure corresponding to the remaining video segment.
Since the frame length of the original video segment is not necessarily an integer multiple of the second frame length, there may be remaining video segments that are smaller than the second frame length.
For example, assuming that the number of precoders is two, the first frame lengths of the two candidate group structures are 8 and 16, respectively, the second frame length is 16, and accordingly the frame length of the target video segment is 16, if the frame length of the original video segment is 170, 10 target video segments can be extracted from the original video segment, the sum of the frame lengths of the 10 target video segments is 160, and then the two precoders are used to encode the target video segments, thereby determining the group structure of the target group corresponding to each target video segment.
At this time, there is a remaining video clip having a third frame length of 10, and the target picture group structure corresponding to the remaining video clip can be determined according to the target picture group structure of the target video clip that has been confirmed.
In one case, when the target group of frames of the target video segment is confirmed to be identical, the target group of frames may be used as the corresponding target group of frames of the remaining video segment.
For example, in the above example, if the first frame length of the target frame group structure of 10 target video clips is 8, the target frame group structure corresponding to the remaining video clips is 8.
In another case, when at least two target picture group structures of the target video segments have been confirmed, the segment similarity between the remaining video segments and each target video segment may be determined, and the target picture group structure corresponding to the remaining video segments may be determined from the target picture group structures corresponding to the plurality of target video segments according to the segment similarity.
Specifically, when determining the segment similarity between the remaining video segment and each target video segment, at least one image frame capable of expressing the central subject may be sampled from the target video segment and the remaining video segment, and the segment similarity between the remaining video segment and each target video segment may be obtained according to the image similarity of the image frame, or the remaining video segment and the target video segment may be input into the neural network model by using a neural network model trained in advance, and the segment similarity between the remaining video segment and each target video segment may be output. After determining the segment similarity between the remaining video segments and each target video segment, the target picture group structure corresponding to the target video segment with the highest remaining video segment similarity may be used as the target picture group structure corresponding to the remaining video segment.
For example, in the above example, if the first frame lengths of the target frame group structures of the 10 target video clips are 8 and 16, respectively, and if the first frame length of the target frame group structure of the target video clip having the highest similarity to the remaining video clips is 8, the first frame length of the target frame group structure corresponding to the remaining video clips is also 8.
Because the third frame length of the residual video segment is smaller than the second frame length, there may be a candidate picture group structure with the first frame length being greater than the third frame length, if the residual video segment is encoded by each precoder at this time, there may be a situation that a complete picture group structure cannot be encoded, which affects the accuracy and reliability of comparing the encoding performance parameters of different precoders, and the target picture group structure of the residual video segment is directly determined by the segment similarity between the residual video segment and each target video segment, so that the target picture group structure of the residual video segment can be reasonably determined, and meanwhile, the residual video segment is not encoded by the precoder, and the video encoding efficiency is also improved to a certain extent.
In another case, when at least two target frame group structures of the target video segment have been confirmed, the third frame length of the remaining video segment may be determined, and the target frame group structure of the target video segment, where the first frame length corresponding to the target video segment is less than or equal to the third frame length, may be used as the target frame group structure corresponding to the remaining video segment.
For example, in the foregoing example, if the first frame length of the target frame group structure of 10 target video clips is 8 and 16, respectively, and the third frame length is 10, the first frame length of the target frame group structure corresponding to the remaining video clips may be 8.
Because the target picture group structure is obtained through screening, the target picture group structure corresponding to the target video segment and having the first frame length smaller than or equal to the third frame length is used as the target picture group structure corresponding to the residual video segment, so that when the residual video segment is encoded based on the target encoder configured with the target picture group structure corresponding to the residual video segment, the screening result of the target picture group structure can be inherited, and at least one complete picture group structure can be ensured to be encoded, thereby improving the reliability of encoding the residual video segment.
And when there is no target picture group structure with the first frame length less than or equal to the third frame length, the target picture group structure of one target video segment can be arbitrarily selected as the target picture group structure corresponding to the remaining video segments.
Aiming at different situations, the target picture group structure corresponding to the residual video clips is determined according to the target picture group structure corresponding to the target video clips, so that the determination of the target picture group structure can be performed without omission and more finely, the accuracy and the reliability of video coding are improved effectively on the whole, and the rate-distortion performance during video coding is improved.
In one possible implementation, the target video segment may not be obtained after dividing the original video segment based on the first scene change frame, and may be configured to configure the candidate group of pictures structure for the precoder.
Wherein, similar to the concept of the first scene change frame, the second scene change frame is used to distinguish between segments of different scenes in the target video segment, and the scene change degree information is used to indicate the intensity degree of the scene change of the target video segment, for example, the scene change degree information may be an inter-frame average distance of a plurality of second scene change frames, or a frame number of the second scene change frames, etc.
In one possible implementation manner, when determining the scene change degree information of the target video segment according to the plurality of second scene change frames, specifically, an inter-frame distance between two adjacent second scene change frames is determined, an average value of the plurality of inter-frame distances is calculated, and the inter-frame average distance is obtained, and is used as the scene change degree information between the plurality of candidate video frames.
Wherein the inter-frame distance is used to indicate the size of the interval between two second scene change frames, which may be, for example, the number of candidate video frames present between two second scene change frames, or the time interval between two second scene change frames. Taking the number of candidate video frames existing between two second scene change frames as an example, referring to fig. 7, fig. 7 is a schematic diagram of an alternative structure of a plurality of candidate video frames provided in an embodiment of the present application, the plurality of second scene change frames are a video frame f1, a video frame f2, a video frame f3, and a video frame f4, 4 candidate video frames exist between the video frame f1 and the video frame f2, 3 candidate video frames exist between the video frame f2 and the video frame f3, 5 candidate video frames exist between the video frame f3 and the video frame f4, then the inter-frame distances are 3, 4, 5 in sequence, and the inter-frame average distance is (3+4+5)/3=4, that is, the scene change degree information is 4.
In one possible implementation manner, when determining the scene change degree information of the target video segment according to the plurality of second scene change frames, the frame number of the plurality of second scene change frames may be determined as the scene change degree information between the plurality of candidate video frames.
For example, referring again to the example shown in fig. 7, the number of frames of the second scene change frame is 4, i.e., the scene change degree information is 4.
Further, the first frame length of the candidate frame group structure is shortened according to the increase of the intensity indicated by the scene change degree information, that is, when the scene change degree information indicates that the scene change of the target video segment is more intense, the first frame length is shorter, and when the scene change degree information is the inter-frame average distance of a plurality of second scene change frames in combination with the previous example of the scene change degree information, the inter-frame average distance is shorter, the intensity of the scene change is higher, and the first frame length is shorter; when the scene change degree information is the frame number of the plurality of second scene change frames, the greater the frame number, the higher the intensity of the scene change, and the shorter the first frame length.
Accordingly, a correspondence between the candidate picture group structure and the scene change degree information may be preset, and after the scene change degree information between the plurality of candidate video frames is determined, the corresponding candidate picture group structure is matched from the preset correspondence and configured to the precoder.
For example, the correspondence may be preset using the frame length of the candidate picture group structure, the correspondence may be [ scene change degree information, the frame length of the candidate picture group structure is first, the frame length of the candidate picture group structure is second,..the array of the frame length N ] structures of the candidate picture group structure, taking the scene change degree information as an example of the inter-frame average distance, if the number of precoders is two, the preset correspondence may be [3,8,16], [6,12,16], etc., when the inter-frame average distance is 3, the frame lengths of the candidate picture group structure are 8 and 16, that is, the two precoders configure the candidate picture group structure with the frame length of 8 and the candidate picture group structure with the frame length of 16, respectively, and when the inter-frame average distance is 6, the frame lengths of the candidate picture group structure are 12 and the candidate picture group structure with the frame length of 16, respectively. It should be understood that the foregoing correspondence is only for illustrative purposes, and may be specifically set according to actual situations, which is not limited by the embodiments of the present application.
In addition, the scene change degree information in the corresponding relation can be a range value, and when the scene change degree information among the plurality of candidate video frames is in the range value in the preset corresponding relation, the corresponding candidate picture group structure can be matched and configured to the precoder, so that the success rate and the stability of the candidate picture group structure matching are improved.
In addition, the same scene change degree information can be correspondingly combined with different candidate picture group structures, namely, the scene change degree information in a plurality of different corresponding relations can be the same, after the scene change degree information is determined, a plurality of groups of different candidate picture group structures can be obtained, at the moment, the frame length of the target video segment can be determined, and from the plurality of groups of different candidate picture group structures, a group of candidate picture group structures with the frame length being the divisor of the frame length of the target video segment are selected and respectively configured into each precoder, so that the precoder can encode a plurality of complete picture group structures when encoding the target video segment.
For example, taking the scene change degree information as an inter-frame average distance as an example, the preset correspondence may include [3,8,12], [3,8,16], [6,12,16] and [6,16,20], when the inter-frame average distance is 3, the matched correspondence is [3,8,12], [3,8,16], and if the frame length of the target video clip is 16, since the frame lengths 8 and 16 of the candidate frame group structure are both divisors of the frame length 16 of the target video clip, the frame lengths 8 and 16 of the candidate frame group structure are finally determined.
The plurality of candidate picture group structures are determined according to the scene change information, and are respectively sent to each precoder for configuration, so that the precoder can be flexibly configured, the adaptation degree of the candidate picture group structure of the precoder and the target video segment is higher, namely the adaptation degree of the target picture group structure obtained by the follow-up determination from the candidate picture group structure and the target video segment is higher, and the rate-distortion performance of video coding is effectively improved.
In addition, referring to fig. 8, fig. 8 is another optional flowchart of a video encoding method provided in an embodiment of the present application, where the video encoding method may be performed by a terminal or may be performed by a server, and the video encoding method includes, but is not limited to, the following steps 801 to 802.
Step 801: acquiring a target video fragment and a target picture group structure, and sending the target picture group structure to a target encoder for configuration;
step 802: and encoding the target video segment based on the target encoder to obtain target encoded data of the target video segment.
The target picture group structure is based on a plurality of precoders to perform code rate constant coding processing on a target video segment, sample coding data output by each precoder is obtained, coding performance parameters of each precoder are determined according to the sample coding data, the target picture group structure is determined from a plurality of candidate picture group structures according to the coding performance parameters, the precoders are configured with preset candidate picture group structures, and the candidate picture group structures of any two precoders are different.
Specifically, in the flow shown in fig. 8, the step of determining the target picture group structure is not performed locally, for example, the step of determining the target picture group structure is performed by a server, and the terminal is only responsible for configuring the target encoder and performing encoding processing based on the target encoder, so that the resource occupation of the terminal when performing video encoding can be effectively reduced. The terminal can send the target video segment to the server, the server carries out constant code rate coding processing on the target video segment based on a plurality of precoders to obtain sample coding data output by each precoder, coding performance parameters of each precoder are determined according to the sample coding data, a target picture group structure is determined from a plurality of candidate picture group structures according to the coding performance parameters, the target picture group structure is sent to the terminal, and the terminal carries out coding processing on the target video segment after configuring the target picture group structure to the target coder.
In one possible implementation manner, after receiving the target video segment sent by the terminal, the server may perform downsampling processing on the target video segment, thereby improving coding efficiency of the precoder and further improving determining efficiency of the target picture group structure. In addition, the terminal may perform downsampling processing on the target video segment, and the downsampled video segment obtained after the downsampling processing is sent to the server, and the server encodes the downsampled video segment sent by the terminal based on the precoder, where the downsampled video segment has a reduced data size compared with the resolution of the target video segment, so that the resource occupation during data transmission between the terminal and the server can be reduced.
In a possible implementation manner, the terminal may also send the video to be encoded to the server, the server determines a plurality of first scene change frames in the video to be encoded, segments the video to be encoded according to the first scene change frames to obtain a plurality of original video segments corresponding to different video scenes, and distributes different candidate picture group structures for the original video segments of different scenes, thereby improving the adaptation degree between the candidate picture group structures and the original video segments, extracting the target video segments from the original video segments for encoding, further improving the adaptation degree between the candidate picture group structures and the target video segments, and effectively improving the rate-distortion performance of video encoding.
In a possible implementation manner, the terminal may also send a plurality of target video segments to the server, the server may determine a plurality of second scene change frames in the target video segments, determine scene change degree information of the target video segments according to the plurality of second scene change frames, further determine a plurality of candidate picture group structures according to the scene change information, and send the plurality of candidate picture group structures to each precoder for configuration, so that the server can flexibly configure the precoder, so that the adaptation degree of the candidate picture group structure of the precoder and the target video segments is higher, that is, the adaptation degree of the target picture group structure obtained from the candidate picture group structure in a subsequent determination is higher than that of the target video segments, thereby effectively improving rate-distortion performance of video coding.
It will be appreciated that the video encoding method shown in fig. 8 and the video encoding method shown in fig. 3 are based on the same inventive concept, and specific details thereof may be found in the foregoing explanation, which is not repeated herein. The target video segment and the target picture group structure are obtained, the target picture group structure is based on a plurality of precoders to perform code rate constant coding processing on the target video segment, sample coding data output by each precoder is obtained, coding performance parameters of each precoder are determined according to the sample coding data, the target picture group structure is determined from a plurality of candidate picture group structures according to the coding performance parameters, the obtained target picture group structure is the picture group structure with the best rate distortion performance in the plurality of candidate picture group structures, so that the screening matching effect of the picture group structure is achieved, the target picture group structure is further sent to the target encoder to be configured, and then the target encoder is based on the target encoder to perform coding processing on the target video segment, so that the target coding data of the target video segment is obtained.
The principle of the video encoding method provided by the embodiment of the present application is described in detail below based on practical examples.
Referring to fig. 9, fig. 9 is an optional complete flow diagram of a video coding method provided by the embodiment of the present application, where a video coding system is deployed on a terminal or a server, where the video coding system at least includes a downsampling module, a precoder a, a precoder B and a target encoder, where the precoder a is configured with a candidate picture group structure a, the precoder B is configured with a candidate picture group structure B, and the precoder a and the precoder B are configured with the same target code rate, the frame length of the target video segment is the smallest common multiple of the frame lengths of the candidate picture group structure a and the candidate picture group structure B, the target video segment is input into the video coding system, the downsampled video segment is first downsampled to a low resolution downsampled video segment by the downsampling module, the downsampled video segment is respectively input into the precoder a and the target encoder B, the precoder a uses the candidate picture group structure a and the candidate picture group structure B and uses CBR mode to encode the downsampled video segment, the precoder B uses the candidate picture group structure B and the pre-sampled video segment is configured with a peak value to determine the signal-to-noise ratio, the signal-to-noise ratio of the target video segment is determined after the peak value is configured to be the target-to the target-noise ratio of the pre-encoder, and obtaining target coding data of the target video segment.
In addition, referring to fig. 10, fig. 10 is a schematic diagram of another alternative complete flow of a video coding method provided in an embodiment of the present application, where a video coding system is deployed on a terminal or a server, where the video coding system at least includes a downsampling module, a precoder a, a precoder B, and a target encoder, the precoder a and the precoder B are configured with the same target code rate, the number of target video segments input into the video coding system is multiple, for each target video segment input into the video coding system, multiple scene change frames in the target video segment are determined, and then scene change degree information of the target video segment is determined according to the multiple scene change frames, and candidate picture group structures a and candidate picture group structures B corresponding to the target video segment respectively are determined according to the scene change degree information, when one of the target video clips is encoded, the corresponding candidate picture group structure a is configured to a precoder A, the corresponding candidate picture group structure B is configured to a precoder B, the target video clips are firstly downsampled into downsampled video clips with low resolution through a downsampling module, the downsampled video clips obtained through downsampling are respectively used as the input of the precoder A and the precoder B, the precoder A encodes the downsampled video clips by using the candidate picture group structure a and using a CBR mode, the precoder B encodes the downsampled video clips by using the candidate picture group structure B and using the CBR mode, the peak signal-to-noise ratio of the precoder A is determined after decoding sample encoded data output by the precoder A, the peak signal-to-noise ratio of the precoder B is determined after decoding sample encoded data output by the precoder B, comparing the peak signal-to-noise ratio of the precoder A with the peak signal-to-noise ratio of the precoder B to make a picture group structure decision, configuring a candidate picture group structure corresponding to the precoder with higher peak signal-to-noise ratio as a target picture group structure to a target encoder, inputting a target video segment to the configured target encoder for encoding, and obtaining target encoding data of the target video segment.
In addition, referring to fig. 11, fig. 11 is another optional complete flow diagram of a video coding method provided by the embodiment of the present application, where the video coding system at least includes a downsampling module, a precoder a, a precoder B and a target encoder, the target encoder is disposed on a terminal, the downsampling module, the precoder a and the precoder B are disposed on a server, the precoder a is configured with a candidate picture group structure a, the precoder B is configured with a candidate picture group structure B, and the precoder a and the precoder B are configured with the same target code rate, the terminal sends the target video segment to the server, the frame length of the target video segment is the smallest common multiple of the frame lengths of the candidate picture group structure a and the candidate picture group structure B, the server downsampled by the downsampling module into downsampled video segments with low resolution, the downsampled video segments are respectively used as inputs of the precoder a and the precoder B, the precoder a uses the candidate picture group structure a and the CBR mode to perform the downsampled video segment, the precoder B uses the CBR mode to perform the signal-to-peak value structure, the signal-to-noise ratio of the pre-coded video segment is determined by the pre-coder, the signal-to-noise ratio of the pre-sampled video segment is determined by the pre-coder, the peak-frame-to be a peak-to the pre-sampled video segment is determined, and the peak-signal-to the peak-to the pre-frame-to be coded by the pre-sampled video segment is encoded by the pre-sampled video segment, and the pre-sampled video segment is encoded by the pre-encoder, and the pre-sampled by the pre-sampled video segment is a peak-frame structure, and the signal-frame is encoded by the signal to the pre-frame, and the signal-frame is a signal-frame to the signal-frame, and the signal-to the signal-frame, and the signal is coded by the signal frame. The terminal configures the received candidate picture group structure as a target picture group structure to a target encoder, and inputs the target video segment to the configured target encoder for encoding, so as to obtain target encoding data of the target video segment.
In addition, referring to fig. 12, fig. 12 is an alternative complete flow diagram of a video coding method provided by the embodiment of the present application, where a video coding system is deployed on a terminal or a server, where the video coding system at least includes a segment preprocessing module, a downsampling module, a precoder a, a precoder B, and a target encoder, where the precoder a is configured with a candidate picture group structure a, the precoder B is configured with a candidate picture group structure B, and the precoder a and the precoder B are configured with the same target code rate, in this example, the segment preprocessing module first determines a plurality of scene change frames of the video to be coded, divides the video to be coded into a plurality of original video segments according to a plurality of scene change frames of the video to be coded, divides the original video segments into a plurality of target video segments according to a least common multiple of frame lengths of the candidate picture group structure a and the candidate picture group structure B, the target video segments are downsampled into downsampled video segments with a low resolution by the downsampling module, the downsampled video segments obtained by downsampling are respectively used as a pre-coder peak value decoder a and a pre-coder B, and the signal-to-sample signal-to-noise ratio of the pre-coded video segments are output by using the pre-coder a pre-coder, and the pre-coder B is determined by using a peak value encoder B, and the signal-to-noise ratio of the pre-coder is determined, and the signal-to-noise ratio of the pre-coded video is output, and the pre-sampled video is encoded by the pre-sampled segment B, and the pre-sampled segment is compared with the pre-sampled segment structure, and taking a candidate picture group structure corresponding to the precoder with higher peak signal-to-noise ratio as a target picture group structure to be configured to a target encoder, and inputting the target video segment to the configured target encoder for encoding to obtain target encoding data of the target video segment.
An exemplary application scenario of the video encoding method provided by the embodiment of the present application is described below.
The video coding method provided by the embodiment of the application can be applied to a local video coding scene, wherein video coding software packaged with a downsampling module, a target coder and a plurality of precoders can be installed on a terminal, when the video coding software is operated on the terminal, target video fragments to be coded can be read from local storage of the terminal or other mobile storage devices, after the target video fragments are coded based on the precoders, the precoder with the highest peak signal-to-noise ratio is determined, the candidate picture group structure of the precoder is configured to the target coder, then the target coder is utilized to code the target video fragments, and after target coding data are obtained, the target video fragments are stored in the local storage of the terminal or other mobile storage devices.
The video coding method provided by the embodiment of the application can also be applied to cloud game scenes, wherein a cloud game software is operated at a terminal, the terminal sends acquired game operation instructions to a server corresponding to the cloud game software, a video coding system is deployed in the server corresponding to the cloud game software, the video coding system comprises a downsampling module, a target encoder and a plurality of precoders, after the game operation instructions sent by the terminal are received at the server, the server performs picture calculation and thresh based on the game operation instructions to obtain target video segments, after the target video segments are coded based on the precoders, a precoder with the highest peak signal-to-noise ratio is determined, the candidate picture group structure of the precoder is configured to the target encoder, the target video segments are coded by the target encoder to obtain target coding data, the target coding data are sent to the terminal, and the cloud game software of the terminal decodes the target coding data, so that corresponding game pictures are displayed.
The video coding method provided by the embodiment of the application can also be applied to a live broadcast scene, wherein live broadcast terminal runs live broadcast software, the live broadcast terminal sends the acquired target video segments to a server corresponding to the live broadcast software, a video coding system is deployed in the server corresponding to the live broadcast software, the video coding system comprises a downsampling module, a target encoder and a plurality of precoders, after the server receives the target video segments sent by the live broadcast terminal, the server codes the target video segments based on the precoders, determines the precoder with the highest peak signal-to-noise ratio, configures the candidate picture group structure of the precoder to the target encoder, codes the target video segments by using the target encoder to obtain target coding data, sends the target coding data to a viewing terminal, and the live broadcast software of the viewing terminal runs live broadcast software as well, so as to decode the target coding data, thereby displaying live broadcast content acquired by the live broadcast terminal.
It will be appreciated that, although the steps in the flowcharts described above are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order unless explicitly stated in the present embodiment, and may be performed in other orders. Moreover, at least some of the steps in the flowcharts described above may include a plurality of steps or stages that are not necessarily performed at the same time but may be performed at different times, and the order of execution of the steps or stages is not necessarily sequential, but may be performed in turn or alternately with at least a portion of the steps or stages in other steps or other steps.
Referring to fig. 13, fig. 13 is a schematic diagram of an alternative structure of a first video encoding apparatus according to an embodiment of the present application, where the first video encoding apparatus 1300 includes:
The first encoding module 1301 is configured to obtain a target video segment, perform constant code rate encoding processing on the target video segment based on a plurality of precoders, and obtain sample encoded data output by each precoder, where the precoders are configured with a preset candidate picture group structure, and candidate picture group structures of any two precoders are different;
A first picture group structure configuration module 1302, configured to determine coding performance parameters of each precoder according to sample coding data, determine a target picture group structure from a plurality of candidate picture group structures according to the coding performance parameters, and send the target picture group structure to the target encoder for configuration;
The second encoding module 1303 is configured to encode the target video segment based on the target encoder to obtain target encoded data of the target video segment.
Further, the first encoding module 1301 is specifically configured to:
determining first frame lengths of each candidate picture group structure, and calculating target common multiple of a plurality of first frame lengths to obtain second frame lengths;
and acquiring an original video segment, and extracting a target video segment from the original video segment according to the second frame length.
Further, the first encoding module 1301 is specifically configured to:
acquiring a video to be encoded, and determining a plurality of first scene change frames in the video to be encoded;
and dividing the video to be coded according to the first scene change frame to obtain a plurality of original video fragments corresponding to different video scenes.
Further, the first encoding module 1301 is specifically configured to:
carrying out framing treatment on the video to be encoded to obtain a plurality of candidate video frames;
Traversing the plurality of candidate video frames, determining a first image similarity between a current candidate video frame and a first adjacent video frame, and a second image similarity between the current candidate video frame and a second adjacent video frame, wherein the first adjacent video frame is a previous video frame adjacent to the current candidate video frame, and the second adjacent video frame is a next video frame adjacent to the current candidate video frame;
and when the difference value between the second image similarity and the first image similarity is larger than or equal to a preset difference value threshold value, determining the current candidate video frame or the second adjacent video frame as a first scene change frame of the video to be encoded.
Further, the first group of pictures structure configuration module 1302 is further configured to:
When the target video segment is extracted from the original video segment according to the second frame length and the residual video segment smaller than the second frame length exists, determining a target picture group structure corresponding to the residual video segment according to a target picture group structure corresponding to the target video segment;
The second encoding module 1303 is further configured to:
And carrying out coding processing on the residual video segments based on the target encoder configured with the target picture group structure corresponding to the residual video segments.
Further, the first group of pictures structure configuration module 1302 is specifically configured to:
Determining the segment similarity between the residual video segments and each target video segment, and determining the target picture group structure corresponding to the residual video segments from the target picture group structures corresponding to the target video segments according to the segment similarity;
Or determining the third frame length of the residual video segment, and taking the target picture group structure of which the first frame length corresponding to the target video segment is smaller than or equal to the third frame length as the target picture group structure corresponding to the residual video segment.
Further, the first encoding module 1301 is specifically configured to:
Determining a plurality of second scene change frames in the target video clip;
determining scene change degree information of the target video segment according to the plurality of second scene change frames, wherein the scene change degree information is used for indicating the intensity degree of scene change of the target video segment;
And determining a plurality of candidate picture group structures according to the scene change information, and respectively sending the plurality of candidate picture group structures to each precoder for configuration, wherein the first frame length of the candidate picture group structures is shortened along with the increase of the intensity indicated by the scene change degree information.
Further, the first encoding module 1301 is specifically configured to:
Determining the inter-frame distance between two adjacent second scene change frames, calculating the average value of a plurality of inter-frame distances to obtain the inter-frame average distance, and taking the inter-frame average distance as the scene change degree information of the target video segment;
Or determining the frame number of the plurality of second scene change frames, and taking the frame number as scene change degree information of the target video segment.
Further, the first group of pictures structure configuration module 1302 is specifically configured to:
when each precoder is configured with the same target code rate, determining a peak signal-to-noise ratio of each precoder according to sample coding data, using the peak signal-to-noise ratio as a coding performance parameter, comparing coding performance parameters corresponding to each precoder, and determining a target picture group structure from a plurality of candidate picture group structures according to a comparison result of the coding performance parameters;
Or when each precoder is configured with different target code rates, determining peak signal-to-noise ratios of each precoder according to sample coding data, determining the bit number of the code stream of each precoder according to the sample coding data, determining the target weight of the bit number of the code stream, obtaining the weighted bit number according to the product of the target weight and the bit number of the code stream, obtaining coding performance parameters according to the sum of the peak signal-to-noise ratios and the weighted bit number, comparing the coding performance parameters corresponding to each precoder, and determining the target picture group structure from a plurality of candidate picture group structures according to the comparison result of the coding performance parameters.
Further, the first group of pictures structure configuration module 1302 is specifically configured to:
Decoding the sample coded data to obtain a reference video segment;
Determining a first pixel value of an image in the target video segment and a second pixel value of an image in the reference video segment;
calculating a mean square value between the first pixel value and the corresponding second pixel value;
the number of pixel bits of the image in the target video segment is determined, and the peak signal-to-noise ratio of each precoder is determined based on the mean square value and the number of pixel bits.
Further, the number of target video segments is plural, and the second encoding module 1303 is specifically configured to:
generating segment identifiers corresponding to all target video segments, and marking corresponding target picture group structures according to the segment identifiers;
And determining a target picture group structure corresponding to the current target video segment according to the segment identification based on the target encoder, and encoding the current target video segment according to the corresponding target picture group structure to obtain target encoding data of the target video segment.
Further, the first encoding module 1301 is specifically configured to:
Performing downsampling processing on the target video segment to obtain a downsampled video segment;
And carrying out constant code rate coding processing on the downsampled video segments based on the plurality of precoders to obtain sample coded data output by each precoder.
The video coding method shown in fig. 3 and the first video coding device 1300 are based on the same inventive concept, the target video segments are obtained, the coding processing with constant code rate is performed on the target video segments based on the plurality of precoders, so as to obtain sample coding data output by each precoder, then coding performance parameters of each precoder are determined according to the sample coding data, the target picture group structure is determined from the plurality of candidate picture group structures according to the coding performance parameters, the obtained target picture group structure is the picture group structure with the best rate distortion performance in the plurality of candidate picture group structures, so as to achieve the screening and matching effect of the picture group structure, the target picture group structure is further sent to the target encoder for configuration, and then the target video segments are coded based on the target encoder, so that the target coding data of the target video segments are obtained.
Referring to fig. 14, fig. 14 is a schematic diagram of an alternative structure of a second video encoding apparatus according to an embodiment of the present application, where the second video encoding apparatus 1400 includes:
A second frame group structure configuration module 1401, configured to obtain a target video clip and a target frame group structure, and send the target frame group structure to a target encoder for configuration;
a third encoding module 1402, configured to encode the target video segment based on the target encoder to obtain target encoded data of the target video segment;
The target picture group structure is based on a plurality of precoders to perform code rate constant coding processing on a target video segment, sample coding data output by each precoder is obtained, coding performance parameters of each precoder are determined according to the sample coding data, the target picture group structure is determined from a plurality of candidate picture group structures according to the coding performance parameters, the precoders are configured with preset candidate picture group structures, and the candidate picture group structures of any two precoders are different.
The second video encoding apparatus 1400 is based on the same inventive concept as the video encoding method shown in fig. 8, and the target video segment and the target picture group structure are obtained, and the target picture group structure is based on a plurality of precoders to perform coding processing of constant code rate on the target video segment, so as to obtain sample encoded data output by each precoder, and determine the encoding performance parameters of each precoder according to the sample encoded data, and determine the encoding performance parameters from the plurality of candidate picture group structures according to the encoding performance parameters, so that the obtained target picture group structure is the picture group structure with the best rate distortion performance in the plurality of candidate picture group structures, thereby achieving the screening and matching effect of the picture group structure, further sending the target picture group structure to the target encoder for configuration, and performing coding processing on the target video segment based on the target encoder, so as to obtain target encoded data of the target video segment.
The electronic device for executing the video encoding method provided by the embodiment of the present application may be a terminal, and referring to fig. 15, fig. 15 is a partial block diagram of the terminal provided by the embodiment of the present application, where the terminal includes: camera assembly 1510, memory 1520, input unit 1530, display unit 1540, sensor 1550, audio circuitry 1560, wireless fidelity (WIRELESS FIDELITY, abbreviated as WiFi) module 1570, processor 1580, power supply 1590, and the like. It will be appreciated by those skilled in the art that the terminal structure shown in fig. 15 is not limiting of the terminal and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
The camera assembly 1510 may be used to capture images or video. Optionally, camera assembly 1510 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions.
The memory 1520 may be used to store software programs and modules, and the processor 1580 performs various functional applications and data processing of the terminal by executing the software programs and modules stored in the memory 1520.
The input unit 1530 may be used to receive input numerical or character information and generate key signal inputs related to the setting and function control of the terminal. In particular, the input unit 1530 may include a touch panel 1531 and other input devices 1532.
The display unit 1540 may be used to display input information or provided information and various menus of the terminal. The display unit 1540 may include a display panel 1541.
Audio circuitry 1560, speakers 1561, and microphone 1562 may provide an audio interface.
The power source 1590 may be alternating current, direct current, disposable battery or rechargeable battery.
The number of sensors 1550 may be one or more, the one or more sensors 1550 including, but not limited to: acceleration sensors, gyroscopic sensors, pressure sensors, optical sensors, etc. Wherein:
the acceleration sensor may detect the magnitudes of accelerations on three coordinate axes of a coordinate system established with the terminal. For example, an acceleration sensor may be used to detect the components of gravitational acceleration in three coordinate axes. The processor 1580 may control the display unit 1540 to display the user interface in a lateral view or a longitudinal view according to the gravitational acceleration signal acquired by the acceleration sensor. The acceleration sensor may also be used for the acquisition of motion data of a game or a user.
The gyroscope sensor can detect the body direction and the rotation angle of the terminal, and the gyroscope sensor can be cooperated with the acceleration sensor to collect the 3D action of the user on the terminal. The processor 1580 may implement the following functions according to the data collected by the gyro sensor: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.
The pressure sensor may be disposed at a side frame of the terminal and/or a lower layer of the display unit 1540. When the pressure sensor is arranged on the side frame of the terminal, the holding signal of the terminal by the user can be detected, and the processor 1580 can perform left-right hand identification or quick operation according to the holding signal acquired by the pressure sensor. When the pressure sensor is disposed at a lower layer of the display unit 1540, the processor 1580 controls the operability control on the UI interface according to the pressure operation of the user on the display unit 1540. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.
The optical sensor is used to collect the ambient light intensity. In one embodiment, the processor 1580 may control the display brightness of the display unit 1540 according to the intensity of ambient light collected by the optical sensor. Specifically, when the ambient light intensity is high, the display luminance of the display unit 1540 is turned up; when the ambient light intensity is low, the display brightness of the display unit 1540 is turned down. In another embodiment, the processor 1580 can also dynamically adjust the capture parameters of the camera assembly 1510 based on the intensity of ambient light captured by the optical sensor.
In this embodiment, the processor 1580 included in the terminal may perform the video encoding method of the previous embodiment.
The electronic device for performing the video encoding method according to the embodiment of the present application may also be a server, and referring to fig. 16, fig. 16 is a partial block diagram of the server according to the embodiment of the present application, where the server 1600 may have a relatively large difference due to different configurations or performances, and may include one or more central processing units (Central Processing Units, abbreviated as CPUs) 1622 (e.g., one or more processors) and a memory 1632, and one or more storage mediums 1630 (e.g., one or more mass storage devices) storing application programs 1642 or data 1644. Wherein memory 1632 and storage medium 1630 may be transitory or persistent. The program stored on the storage medium 1630 may include one or more modules (not shown), each of which may include a series of instruction operations on the server 1600. Further, the central processor 1622 may be configured to communicate with a storage medium 1630 to execute a series of instruction operations on the storage medium 1630 on the server 1600.
The server 1600 may also include one or more power supplies 1626, one or more wired or wireless network interfaces 1650, one or more input output interfaces 1658, and/or one or more operating systems 1641, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, and the like.
The processor in server 1600 may be used to perform the video encoding method.
Embodiments of the present application also provide a computer readable storage medium storing program code for executing the video encoding method of the foregoing embodiments.
Embodiments of the present application also provide a computer program product comprising a computer program stored on a computer readable storage medium. The processor of the computer device reads the computer program from the computer-readable storage medium, and the processor executes the computer program so that the computer device performs the video encoding method described above.
The terms "first," "second," "third," "fourth," and the like in the description of the application and in the above figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented, for example, in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one (item)" means one or more, and "a plurality" means two or more. "and/or" for describing the association relationship of the association object, the representation may have three relationships, for example, "a and/or B" may represent: only a, only B and both a and B are present, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b or c may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
It should be understood that in the description of the embodiments of the present application, plural (or multiple) means two or more, and that greater than, less than, exceeding, etc. are understood to not include the present number, and that greater than, less than, within, etc. are understood to include the present number.
In the several embodiments provided in the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of elements is merely a logical functional division, and there may be additional divisions of actual implementation, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory RAM), a magnetic disk, or an optical disk, etc., which can store program codes.
It should also be appreciated that the various embodiments provided by the embodiments of the present application may be arbitrarily combined to achieve different technical effects.
While the preferred embodiment of the present application has been described in detail, the present application is not limited to the above embodiments, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit and scope of the present application, and these equivalent modifications or substitutions are included in the scope of the present application as defined in the appended claims.
Claims (18)
1. A video encoding method, comprising:
Obtaining a target video segment, and performing code rate constant coding processing on the target video segment based on a plurality of precoders to obtain sample coding data output by each precoder, wherein the precoders are configured with preset candidate picture group structures, and the candidate picture group structures of any two precoders are different;
Determining coding performance parameters of each precoder according to the sample coding data, determining a target picture group structure from a plurality of candidate picture group structures according to the coding performance parameters, and sending the target picture group structure to a target encoder for configuration;
and carrying out coding processing on the target video segment based on the target encoder to obtain target coding data of the target video segment.
2. The video coding method of claim 1, wherein the obtaining the target video segment comprises:
determining the first frame length of each candidate picture group structure, and calculating the target common multiple of a plurality of first frame lengths to obtain a second frame length;
And acquiring an original video segment, and extracting the target video segment from the original video segment according to the second frame length.
3. The method of video coding according to claim 2, wherein said obtaining an original video clip comprises:
acquiring a video to be encoded, and determining a plurality of first scene change frames in the video to be encoded;
And dividing the video to be encoded into segments according to the first scene change frame to obtain a plurality of original video segments corresponding to different video scenes.
4. The method of video encoding according to claim 3, wherein said determining a plurality of first scene change frames in the video to be encoded comprises:
carrying out framing treatment on the video to be encoded to obtain a plurality of candidate video frames;
Traversing a plurality of candidate video frames, determining a first image similarity between the current candidate video frame and a first adjacent video frame and a second image similarity between the current candidate video frame and a second adjacent video frame, wherein the first adjacent video frame is a previous video frame adjacent to the current candidate video frame, and the second adjacent video frame is a next video frame adjacent to the current candidate video frame;
And when the difference value between the second image similarity and the first image similarity is larger than or equal to a preset difference value threshold, determining the current candidate video frame or the second adjacent video frame as the first scene change frame of the video to be encoded.
5. The video coding method according to any one of claims 2 to 4, characterized in that the video coding method further comprises:
When the target video segment is extracted from the original video segment according to the second frame length and a residual video segment smaller than the second frame length exists, determining the target picture group structure corresponding to the residual video segment according to the target picture group structure corresponding to the target video segment;
And carrying out coding processing on the residual video segments based on the target encoder configured with the target picture group structure corresponding to the residual video segments.
6. The method according to claim 5, wherein said determining the target group of pictures structure for the remaining video segments from the target group of pictures structure for the target video segments comprises:
Determining the segment similarity between the residual video segments and each target video segment, and determining the target picture group structure corresponding to the residual video segments from the target picture group structures corresponding to a plurality of target video segments according to the segment similarity;
or determining a third frame length of the remaining video segments, and taking the target picture group structure of which the first frame length corresponding to the target video segments is smaller than or equal to the third frame length as the target picture group structure corresponding to the remaining video segments.
7. The video coding method of claim 1, wherein the precoder presets the candidate group of pictures structure by:
Determining a plurality of second scene change frames in the target video clip;
Determining scene change degree information of the target video segment according to a plurality of second scene change frames, wherein the scene change degree information is used for indicating intensity degree of scene change of the target video segment;
and determining a plurality of candidate picture group structures according to the scene change information, and respectively sending the plurality of candidate picture group structures to each precoder for configuration, wherein the first frame length of the candidate picture group structure is shortened along with the increase of the intensity indicated by the scene change degree information.
8. The video coding method of claim 7, wherein the determining scene change degree information of the target video clip according to the plurality of the second scene change frames comprises:
Determining the inter-frame distance between two adjacent second scene change frames, calculating the average value of a plurality of inter-frame distances to obtain the inter-frame average distance, and taking the inter-frame average distance as the scene change degree information of the target video segment;
Or determining the frame number of a plurality of second scene change frames, and taking the frame number as scene change degree information of the target video segment.
9. The method according to claim 1, wherein said determining the coding performance parameter of each of said precoders from said sample coded data, determining a target group of pictures structure from a plurality of said candidate group of pictures structures from said coding performance parameter, comprises:
When each precoder is configured with the same target code rate, determining a peak signal-to-noise ratio of each precoder according to the sample coding data, using the peak signal-to-noise ratio as the coding performance parameter, comparing the coding performance parameters corresponding to each precoder, and determining a target picture group structure from a plurality of candidate picture group structures according to the comparison result of the coding performance parameters;
Or when each precoder is configured with different target code rates, determining peak signal-to-noise ratios of each precoder according to the sample coding data, determining the number of code stream bits of each precoder according to the sample coding data, determining target weights of the number of code stream bits, obtaining weighted bit numbers according to products of the target weights and the number of code stream bits, obtaining the coding performance parameters according to the sum of the peak signal-to-noise ratios and the weighted bit numbers, comparing the coding performance parameters corresponding to each precoder, and determining a target picture group structure from a plurality of candidate picture group structures according to comparison results of the coding performance parameters.
10. The method of video coding according to claim 9, wherein said determining a peak signal-to-noise ratio for each of said precoders from said sample encoded data comprises:
decoding the sample coded data to obtain a reference video segment;
Determining a first pixel value of an image in the target video segment and a second pixel value of an image in the reference video segment;
Calculating a mean square value between the first pixel value and the corresponding second pixel value;
And determining the pixel bit number of the image in the target video segment, and determining the peak signal-to-noise ratio of each precoder according to the mean square value and the pixel bit number.
11. The video coding method according to claim 1, wherein the number of the target video segments is plural, the encoding processing is performed on the target video segments based on the target encoder to obtain target encoded data of the target video segments, including:
Generating segment identifiers corresponding to the target video segments, and marking the corresponding target picture group structures according to the segment identifiers;
And determining the current target picture group structure corresponding to the target video segment according to the segment identification based on the target encoder, and carrying out encoding processing on the current target video segment according to the corresponding target picture group structure to obtain target encoding data of the target video segment.
12. The video coding method according to claim 1, wherein the coding process based on the plurality of precoders for performing constant code rate on the target video segment to obtain sample coded data output by each precoder comprises:
performing downsampling processing on the target video segment to obtain a downsampled video segment;
and carrying out constant code rate coding processing on the downsampled video segments based on a plurality of precoders to obtain sample coding data output by each precoder.
13. A video encoding method, comprising:
acquiring a target video segment and a target picture group structure, and sending the target picture group structure to a target encoder for configuration;
Encoding the target video segment based on the target encoder to obtain target encoding data of the target video segment;
The target picture group structure is based on a plurality of precoders to perform coding processing of constant code rate on the target video segment, sample coding data output by each precoder is obtained, coding performance parameters of each precoder are determined according to the sample coding data, the target picture group structure is determined from a plurality of candidate picture group structures according to the coding performance parameters, the precoders are configured with preset candidate picture group structures, and the candidate picture group structures of any two precoders are different.
14. A video encoding apparatus, comprising:
The first coding module is used for obtaining a target video segment, and carrying out code rate constant coding processing on the target video segment based on a plurality of precoders to obtain sample coding data output by each precoder, wherein the precoders are configured with preset candidate picture group structures, and the candidate picture group structures of any two precoders are different;
A first picture group structure configuration module, configured to determine coding performance parameters of each precoder according to the sample coding data, determine a target picture group structure from a plurality of candidate picture group structures according to the coding performance parameters, and send the target picture group structure to a target encoder for configuration;
And the second coding module is used for coding the target video segment based on the target encoder to obtain target coding data of the target video segment.
15. A video encoding apparatus, comprising:
The second picture group structure configuration module is used for acquiring a target video fragment and a target picture group structure and sending the target picture group structure to a target encoder for configuration;
The third coding module is used for coding the target video segment based on the target encoder to obtain target coding data of the target video segment;
The target picture group structure is based on a plurality of precoders to perform coding processing of constant code rate on the target video segment, sample coding data output by each precoder is obtained, coding performance parameters of each precoder are determined according to the sample coding data, the target picture group structure is determined from a plurality of candidate picture group structures according to the coding performance parameters, the precoders are configured with preset candidate picture group structures, and the candidate picture group structures of any two precoders are different.
16. An electronic device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the video encoding method of any one of claims 1 to 13 when executing the computer program.
17. A computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the video encoding method of any one of claims 1 to 13.
18. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the video encoding method of any one of claims 1 to 13.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310293622.7A CN118678068A (en) | 2023-03-15 | 2023-03-15 | Video encoding method, video encoding device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310293622.7A CN118678068A (en) | 2023-03-15 | 2023-03-15 | Video encoding method, video encoding device, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118678068A true CN118678068A (en) | 2024-09-20 |
Family
ID=92721516
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310293622.7A Pending CN118678068A (en) | 2023-03-15 | 2023-03-15 | Video encoding method, video encoding device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118678068A (en) |
-
2023
- 2023-03-15 CN CN202310293622.7A patent/CN118678068A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7822279B2 (en) | Apparatus and method for encoding and decoding multi-view picture using camera parameter, and recording medium storing program for executing the method | |
US10244167B2 (en) | Apparatus and methods for image encoding using spatially weighted encoding quality parameters | |
CN113170234B (en) | Adaptive encoding and streaming method, system and storage medium for multi-directional video | |
CN104541308A (en) | Encoding images using a 3D mesh of polygons and corresponding textures | |
KR20210134992A (en) | Distinct encoding and decoding of stable information and transient/stochastic information | |
TWI667914B (en) | Picture data encoding and decoding method and apparatus | |
CN112584049A (en) | Remote interaction method and device, electronic equipment and storage medium | |
US10051281B2 (en) | Video coding system with efficient processing of zooming transitions in video | |
US10536726B2 (en) | Pixel patch collection for prediction in video coding system | |
US20220398692A1 (en) | Video conferencing based on adaptive face re-enactment and face restoration | |
CN113366842B (en) | System and method for content layer based video compression | |
JP2008505522A (en) | Video processing | |
CN118678068A (en) | Video encoding method, video encoding device, electronic equipment and storage medium | |
WO2019135270A1 (en) | Motion video analysis device, motion video analysis system, motion video analysis method, and program | |
US10917657B2 (en) | Method for encoding and decoding images, device for encoding and decoding images, and corresponding computer programs | |
Tran et al. | Spherical LSB Data Hiding in 360° Videos Using Morphological Operations | |
CN116760986B (en) | Candidate motion vector generation method, candidate motion vector generation device, computer equipment and storage medium | |
US20240121408A1 (en) | Region of interest coding for vcm | |
CN113079372B (en) | Method, device and equipment for coding inter-frame prediction and readable storage medium | |
CN112437304B (en) | Video decoding method, encoding method, device, equipment and readable storage medium | |
US20240137558A1 (en) | Vertex motion vector predictor coding for vertex mesh (v-mesh) | |
US20240305789A1 (en) | Machine learning (ml)-based rate control algorithm to find quantization parameter (qp) | |
CN118632000A (en) | Image coding and decoding method, device and system | |
CN114697678A (en) | Image encoding method, image encoding device, storage medium, and image encoding apparatus | |
CN117764834A (en) | Image restoration method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication |