CN117714700A

CN117714700A - Video coding method, device, equipment, readable storage medium and product

Info

Publication number: CN117714700A
Application number: CN202311765472.1A
Authority: CN
Inventors: 段晨辉; 陈靖
Original assignee: Shuhang Technology Beijing Co ltd
Current assignee: Shuhang Technology Beijing Co ltd
Priority date: 2023-12-20
Filing date: 2023-12-20
Publication date: 2024-03-15

Abstract

The embodiment of the application discloses a video coding method, a device, equipment, a readable storage medium and a product, wherein the method comprises the following steps: acquiring an image group contained in video data to be processed and a target scene to which the image group belongs; the image group comprises a plurality of video frames; determining video coding parameters for the image group based on a target scene to which the image group belongs; the video coding parameters of the image group are used for carrying out video coding processing on a plurality of video frames contained in the image group; and respectively acquiring the types of the plurality of video frames, and respectively carrying out video coding processing on the plurality of video frames contained in the image group based on the video coding parameters of the image group and the types of the plurality of video frames to obtain coded data of the plurality of video frames contained in the image group. By adopting the embodiment of the application, the video data can be subjected to video coding in combination with specific scenes, and the video coding modes are enriched.

Description

Video coding method, device, equipment, readable storage medium and product

Technical Field

The present disclosure relates to the field of video processing technologies, and in particular, to a video encoding method, apparatus, device, readable storage medium, and product.

Background

In a live video scene, live video data of a main broadcasting end are required to be collected, and the collected video data are sent to a client for video display after video coding. However, the current video coding method generally adopts fixed video coding parameters to perform video coding on video data, and the video coding method is single, so that the video data obtained by coding cannot meet the service requirements under different live broadcast scenes. Therefore, how to perform video coding on video data in combination with specific scenes, and how to enrich the video coding mode are urgent problems to be solved.

Disclosure of Invention

The embodiment of the application provides a video coding method, a device, equipment, a readable storage medium and a product, which can be used for video coding of video data in combination with specific scenes and enrich video coding modes.

In a first aspect, the present application provides a video encoding method, including:

acquiring an image group contained in video data to be processed and a target scene to which the image group belongs; the image group comprises a plurality of video frames;

determining video coding parameters for the image group based on a target scene to which the image group belongs; the video coding parameters of the image group are used for carrying out video coding processing on a plurality of video frames contained in the image group;

And respectively acquiring the types of the plurality of video frames, and respectively carrying out video coding processing on the plurality of video frames contained in the image group based on the video coding parameters of the image group and the types of the plurality of video frames to obtain coded data of the plurality of video frames contained in the image group.

In a second aspect, the present application provides a video encoding apparatus, comprising:

the data acquisition unit is used for acquiring an image group contained in the video data to be processed and a target scene to which the image group belongs; the image group comprises a plurality of video frames;

a parameter determining unit, configured to determine a video coding parameter for the image group based on a target scene to which the image group belongs; the video coding parameters of the image group are used for carrying out video coding processing on a plurality of video frames contained in the image group;

the video coding unit is used for respectively acquiring the types of the plurality of video frames, and respectively carrying out video coding processing on the plurality of video frames contained in the image group based on the video coding parameters of the image group and the types of the plurality of video frames to obtain coded data of the plurality of video frames contained in the image group.

In a third aspect, the present application provides a computer device comprising: a processor, a memory, a network interface;

The processor is connected to a memory for providing data communication functions, a network interface for storing computer program code, and for calling the computer program code to cause a computer device comprising the processor to perform the video encoding method.

In a fourth aspect, the present application provides a computer readable storage medium having stored therein a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the above-described video encoding method.

In a fifth aspect, the present application provides a computer program product or computer program comprising computer instructions which, when executed by a processor, implement the video encoding method provided in the various alternatives in the first aspect of the present application.

In the embodiment of the application, an image group contained in video data to be processed and a target scene to which the image group belongs are acquired; the image group comprises a plurality of video frames; determining video coding parameters for the image group based on a target scene to which the image group belongs; the video coding parameters of the image group are used for carrying out video coding processing on a plurality of video frames contained in the image group; and respectively acquiring types of a plurality of video frames, and respectively carrying out video coding processing on the plurality of video frames contained in the image group based on the video coding parameters of the image group and the types of the plurality of video frames to obtain coded data of the plurality of video frames contained in the image group. When video encoding is carried out on video frames in the image group, corresponding video encoding parameters are selected by combining scenes to which the image group belongs, so that video encoding can be carried out on video data by combining specific scenes, and video encoding modes are enriched. In addition, the video coding processing is carried out by combining the video coding modes corresponding to the type selection of each video frame in the image group, so that the accuracy of video coding can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a network architecture diagram of a video coding system according to an embodiment of the present application;

fig. 2 is a schematic flow chart of a video encoding method according to an embodiment of the present application;

fig. 3 is a schematic view of a video processing method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a video frame provided in an embodiment of the present application;

fig. 5 is a flowchart of another video encoding method according to an embodiment of the present application;

fig. 6 is a schematic diagram of a composition structure of a video encoding apparatus according to an embodiment of the present application;

fig. 7 is a schematic diagram of a composition structure of a computer device according to an embodiment of the present application.

Detailed Description

The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The technical scheme of the method and the device can be applied to the scene for video coding of the video data in the live broadcast scene. For example, the method is applicable to a video interaction platform, and can realize targeted video coding and enrich video coding modes by determining corresponding video coding parameters by combining scenes to which video data belong and performing video coding by combining the video coding parameters corresponding to the scenes. The technical solution of the embodiment of the present application may also be applied to any scene requiring video coding, for example, but not limited to a video interaction platform, a social platform, a content interaction platform, an audio interaction platform, and so on, which is not limited in the embodiment of the present application.

It can be appreciated that the collection, use and processing of relevant data (such as video data to be processed, video coding parameters, target scenes, etc.) data involved in the embodiments of the present application requires consent of the corresponding personal information body, and compliance with the regulations of the relevant laws and regulations.

Referring to fig. 1, fig. 1 is a network architecture diagram of a video coding system provided in an embodiment of the present application, as shown in fig. 1, a computer device may perform data interaction with terminal devices, and the number of the terminal devices may be one or at least two. For example, when the number of terminal apparatuses is plural, the terminal apparatuses may include the terminal apparatus 101a, the terminal apparatus 101b, the terminal apparatus 101c, and the like in fig. 1. Taking the terminal device 101a as an example, the computer device 102 may acquire an image group included in the video data to be processed and a target scene to which the video data to be processed belongs. Further, the computer device 102 may determine video encoding parameters for the group of images based on the target scene to which the group of images belongs. The computer device 102 may obtain types of a plurality of video frames, and perform video encoding processing on the plurality of video frames included in the image group based on the video encoding parameters of the image group and the types of the plurality of video frames, so as to obtain encoded data of the plurality of video frames included in the image group. Optionally, the computer device 102 may further send the encoded data of the plurality of video frames included in the image group to the terminal device 101a, so that the terminal device 101a performs video decoding processing on the encoded data, and then displays the corresponding video data.

The computer device mentioned in the embodiment of the present application may refer to a server or a terminal device, or may be a system formed by a server and a terminal device, which is not limited in this embodiment of the present application. The terminal device may be an electronic device including, but not limited to, a cell phone, tablet computer, notebook computer, vehicle mounted device, etc. The server mentioned above may be an independent physical server, may be a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, and basic cloud computing services such as big data and artificial intelligence platforms.

Further, referring to fig. 2, fig. 2 is a flow chart of a video encoding method according to an embodiment of the present application; as shown in fig. 2, the video encoding method may be applied to a computer device, and includes, but is not limited to, the following steps:

s101, acquiring an image group contained in video data to be processed and a target scene to which the image group belongs.

Referring to fig. 3, fig. 3 is a schematic view of a scene of a video processing method provided in an embodiment of the present application, as shown in fig. 3, after a live broadcast is started by a main broadcasting terminal, a video collecting device of the main broadcasting terminal may collect video data of the live broadcast of the main broadcasting terminal, and the video data is preprocessed, for example, the preprocessing may include video optimization processing such as beautifying, video special effects, and the like, so as to obtain preprocessed video data. Further, video coding processing can be performed on the video data after the preprocessing to obtain coded data, the coded data is packaged and transmitted, namely, the coded data is packaged into data packets, and the data packets are transmitted to a packet receiving and framing through a network, so that the transmission of the coded data is realized. Further, the data packet is subjected to post-processing, for example, checking, to determine whether the data packet is in a correct format and whether frames are lost, and if the data packet is in a correct format and is not lost, the video decoder at the viewer end performs video decoding processing on the data packet to obtain decoded data, namely, the video data after the pre-processing. Further, the audience terminal can conduct video rendering on the decoded data, and the decoded data after rendering is displayed on the audience terminal, so that live video data after video optimization processing such as beautifying and video special effects on video data of the anchor terminal can be viewed through the audience terminal.

In this embodiment of the present application, the video data to be processed may refer to video data in a live video scene, for example, may include, but not limited to, live performance data of a live studio, live game data of a specific match with other live video, and so on. After the video acquisition equipment of the anchor terminal acquires the video data in the live broadcast scene of the anchor terminal, the acquired video data can be uploaded, so that the video data to be processed is obtained. Further, video division processing may be performed on the video data to be processed, and the video data to be processed may be divided into image groups.

Wherein, the image group can contain a plurality of video frames, and the plurality of video frames contained in one image group are continuous. A Group Of pictures may refer to a video base unit, i.e., a Group Of Pictures (GOP), a Group Of pictures being a set Of consecutive Picture pictures. In the process of video coding the video data to be processed, the video data to be processed is divided into image groups, so that the subsequent video coding processing is facilitated.

Further, the target scene may be a live scene, the target scene may include a variety of live scenes, for example, the live scene may include a first scene, a second scene, and a third scene. The first scene may refer to a scene requiring continuous live broadcasting, the second scene may refer to a scene requiring no continuous live broadcasting and having a number of live broadcasting audiences smaller than or equal to a preset number, and the third scene may refer to a scene requiring no continuous live broadcasting and having a number of live broadcasting audiences larger than the preset number. For example, the first scene may refer to a live link scene, which may refer to a scene in which a main cast end and a viewer end perform live link, or may refer to a scene in which a main cast end and other main cast ends perform live link, and so on. The second scene may refer to a common live broadcast scene, for example, the common live broadcast scene may refer to a scene in which the number of live viewers such as a game live broadcast, a singing live broadcast, a dance live broadcast, a live broadcast with goods, and the like is less than or equal to a preset number. The third scene may refer to a hot live scene, for example, the hot live scene may refer to a scene in which a hot anchor performs live broadcasting and the number of live viewers is greater than a preset number, for example, the hot live scene may be a scene indicating satellite live broadcasting, authoritative expert live broadcasting in the industry, and the like.

In a live broadcast scene, three indexes of delay, definition and fluency are important for video data in the live broadcast scene, and the delay can be used for indicating the data transmission speed of a video transmitting end for transmitting the video data to a video receiving end in the video live broadcast. The smaller the delay, the faster the data transmission speed is, and the faster the client can view the video data in the live scene. Definition may refer to the quality of video in video data, with higher definition being higher. The lower the sharpness, the lower the video image quality in the video data. Fluency may refer to a frame rate, the lower the frame rate, the worse the video coherence of the video data, and the higher the frame rate, the higher the video coherence of the video data.

However, in general, there is a trade-off between these three indexes in video data in any live scene, for example, the lower the latency is, the lower the sharpness is and the lower the smoothness is, or the higher the sharpness is, the higher the latency is and the lower the smoothness is, or the higher the smoothness is, the higher the latency is and the lower the sharpness is. The requirements of the three indexes for different live broadcast scenes are different, so that the corresponding indexes in various live broadcast scenes can be improved by encoding video data in various live broadcast scenes by adopting different video encoding modes. For example, in the first scene, if a live-broadcast wheat-linking scene is needed, the delay is required to be reduced, so that a better live-broadcast wheat-linking effect can be realized. The lower the delay in the live broadcasting scene of the wheat is, the smoother the live broadcasting process is, and the better the live broadcasting effect is. In the second scene, the video fluency needs to be improved, and a better live broadcast effect can be achieved. In a third scene, the video definition needs to be improved, and a better live broadcast effect can be achieved.

Because the current scheme adopts a unified coding mode when video coding is carried out on video data, the image quality of the video data in all scenes displayed on a client is the same. The live broadcast scenes can be subdivided into various live broadcast scenes such as continuous live broadcast, common live broadcast and hot live broadcast, and the requirements for video data in each live broadcast scene are different, for example, the requirements for delay are higher for continuous live broadcast, and the delay of the live broadcast video data is required to be reduced as much as possible so as not to reduce user experience. Aiming at hot live broadcast, such as hot anchor sales, a plurality of audiences have no wheat linking request, the requirement on video quality is higher, and the requirement on image quality is finer. Aiming at normal live broadcast, the method has no wheat connecting request, and has no many audiences, such as game live broadcast or sports live broadcast, and has higher requirements on smoothness. Therefore, aiming at each live broadcast scene, the video coding processing needs to be carried out in a targeted mode, and the subjective experience of a user is improved.

Because the index requirements for each scene are different, and the video parameters in the video data to be processed recorded by the anchor terminal are fixed, when the video data to be processed is subjected to video coding processing, different video coding parameters are selected to carry out video coding, so that the same video data can be subjected to different performances of three indexes of fluency, delay and definition when the video data in different live broadcast scenes are displayed on the client terminal when the decoded video data are displayed on the client terminal, such as the anchor terminal or the audience terminal, after the video coding processing.

In the embodiment of the present invention, for example, if the target scene to which the image group belongs is the first scene, the delay of the video data viewed by the viewer can be reduced by performing video encoding processing on the image group included in the video data to be processed by combining the scenes. For example, if the target scene to which the image group belongs is the second scene, the fluency of the video data viewed by the audience can be improved by performing video encoding processing on the image group contained in the video data to be processed by combining the scenes. For example, if the target scene to which the image group belongs is a third scene, the image group included in the video data to be processed is subjected to video coding processing by combining the scenes, so that the definition of the video data watched by the audience can be improved, the video image quality can be improved, and the targeted video coding processing by combining the scenes corresponding to the video data to be processed can be realized, so that the video coding modes are enriched.

In this embodiment of the present application, if the video data to be processed includes an image group, the target scene to which the image group belongs is the target scene to which the video data to be processed belongs. If the video data to be processed contains a plurality of image groups, the target scene to which the video data to be processed belongs comprises the target scene to which the plurality of image groups belong. For example, if the video duration of the obtained video data to be processed is long, the scene to which the previous period belongs in the video data to be processed is a first scene, and the scene to which the next period belongs in the video data to be processed is a second scene, and if the target scenes to which the plurality of image groups included in the video data to be processed respectively belong may be different, the target scenes to which each image group belongs may be obtained for each image group respectively, so that the target scenes to which each image group belongs are respectively subjected to targeted video coding, and corresponding coding data are obtained.

In the embodiment of the application, when the target scene of the image group is acquired, the service end can detect the target scene of the image group; or, image recognition can be performed on a plurality of video frames contained in the image group in an image recognition mode, so that a target scene to which the image group belongs is determined; alternatively, a specified scene may also be acquired as a target scene to which the image group belongs, for example, the specified scene may refer to a scene specified by the client. The manner of specifically determining the target scene to which the image group belongs may be referred to the following detailed description in step S201, and will not be described here too much.

S102, determining video coding parameters for the image group based on the target scene to which the image group belongs.

In the embodiment of the present application, the principle of video encoding processing is that by controlling video encoding parameters in a video encoder, the video encoder performs video encoding based on the video encoding parameters, and the video encoding parameters are set differently, so that the effects of video playing at the viewer end after the subsequent video decoding processing of the encoded video data are different. Because the current coding mode generally sets fixed video coding parameters, video effects of video data obtained by coding video data under each scene based on the fixed video coding parameters are the same as video effects displayed at the audience end, and targeted video display cannot be realized.

Therefore, in the embodiment of the application, aiming at different target scenes to which the image groups in the video data to be processed belong, the video coding parameters of the corresponding scenes can be selected, so that the video coder is controlled to carry out video coding processing by using the video coding parameters of the corresponding scenes, and the coded video data can meet the service requirements under the corresponding scenes. The video coding parameters of the image group are used for carrying out video coding processing on a plurality of video frames contained in the image group.

In one embodiment, the video encoding parameters of the group of pictures include at least one of: referring to the number of frames, fractional pixel search accuracy, the number of frames is buffered. The reference frame number refers to the number of video frames that need to be referenced for encoding each video frame contained in the image group. The fractional pixel search accuracy is used to search for reference image blocks in the reference frame that match the target image block in each video frame, the reference image blocks being image blocks in the reference frame that have a similarity to pixels of the target image block greater than a similarity threshold. The number of buffered frames refers to the number of buffered video frames that are used to allocate the code rate of each video frame in the group of images.

The number of reference frames is different, the accuracy of the motion of the moving object in the coded data obtained by coding is different, and the more the number of the reference frames is, the finer the video coding is, but the lower the video coding efficiency is correspondingly. The smaller the number of reference frames, the higher the video coding efficiency. The target image block may refer to an image block in the current frame, i.e., a current image block. Image blocks may refer to macro-blocks, and in video coding techniques, video compression is performed by dividing an image picture in a video frame into macro-blocks. Fractional pixel search accuracy means that fractional pixel search is supported when searching for a motion vector of a current image block over a reference frame, the fractional pixels being synthesized from integer pixels through filtering, because the motion in the video is not necessarily integer pixel motion. The finer the supported fractional pixels, the greater the computational effort, and thus the more accurate the resulting motion vector, and the higher the coding quality. For example, fractional pixel search accuracy refers to searching in a reference frame for an image block that is most similar to an image block pixel in a current video frame, and the current search mode is generally full-pixel search. However, in practice, the motion change of the object is large, so that the accuracy of motion description can be improved by setting fractional pixel search accuracy, namely, the fractional pixel search accuracy is not searched by using the whole pixel unit but by using 1/2 and 1/4 pixel units. The larger the number of buffered frames, the larger the number of video frames that can be referenced in video encoding, the finer the video encoding, and the lower the video encoding efficiency. By caching video frames, the cached video frames can be subjected to code rate allocation, so that video coding can be performed by combining the code rate of each video frame when the video coding processing can be performed.

In the embodiment of the application, by controlling the video coding parameters in the video coder, such as the number of reference frames, the fractional pixel searching precision and the number of buffer frames, the video coding processing of the video data can be realized, and the video coding effects are different due to the different setting of the three parameters, such as the number of reference frames, the fractional pixel searching precision and the number of buffer frames. The video playing effect of the video data played at the client is different. For example, the larger the number of reference frames, the more video frames that are referenced when each video frame is subjected to video encoding processing, the higher the video definition of encoded data obtained by encoding is when the encoded data is subsequently displayed. The greater the fractional pixel searching precision is, the more finely searching mode can be used for searching the image blocks in the searching reference frame when each video frame is subjected to video coding processing, so that the finer the description motion is, the higher the video fluency of coded data obtained by coding is when the coded data is displayed later. The larger the number of the buffered frames is, the more the number of the buffered video frames is, and before the video encoding processing is performed, each buffered frame can be estimated respectively, so that different code rates are allocated to each video frame, and the video definition can be improved.

In one embodiment, if the target scene to which the image group belongs is a first scene, for example, the first scene is a scene to be subjected to live broadcasting in a continuous mode, it may be determined that the reference frame number belongs to a first number interval, the fractional pixel search precision belongs to a first search precision interval, and the buffer frame number belongs to a first buffer quantity interval. Wherein the first number interval may comprise one or more values, for example 1 and 2. The first search precision interval may include one or more values, for example, may include 1 and 1.5, and the first buffer quantity interval may include one or more values, for example, may include 1, 2, 3, 4, 5, and so on.

For example, in a live-link scenario, if only latency needs to be reduced, the number of reference frames may be determined to be a smaller value of the first number of intervals, such as 1. If it is desired to properly enhance the video sharpness on the basis of reduced latency, the number of reference frames may be determined to be a larger value, e.g., 2, for the first number of intervals. By properly reducing the requirement on the time delay, the definition requirement can be properly improved, thereby achieving the trade-off.

In one embodiment, if the target scene to which the image group belongs is a second scene, for example, the second scene is a normal live broadcast scene, it may be determined that the reference frame number belongs to the first number interval, the fractional pixel search precision belongs to the second search precision interval, and the buffer frame number belongs to the second buffer number interval. The minimum value of the second search precision interval may be greater than the maximum value of the first search precision interval, and the minimum value of the second buffer quantity interval is greater than the maximum value of the first buffer quantity interval. The second search precision interval may comprise, for example, one or more values, may comprise, for example, 2 and 3, etc. The second buffer amount interval may include one or more values, for example, may include 6, 7, 8, 9, 10, etc.

For example, in a normal live broadcast scenario, if only smoothness needs to be improved, the fractional pixel search precision may be determined as a larger value of the second search precision interval, such as 3. If it is desired to properly enhance the video sharpness on the basis of enhancing the fluency, the fractional pixel search accuracy may be determined as a smaller value, such as 2, for the second search accuracy interval. By properly reducing the requirements for fluency, the requirements for sharpness can be properly increased, thus achieving trade-offs.

In one embodiment, if the target scene to which the image group belongs is a third scene, for example, the second scene is a hot live scene, it is determined that the reference frame number belongs to the second number interval, the fractional pixel search precision belongs to the second search precision interval, and the buffer frame number belongs to the third buffer quantity interval. The minimum value of the second number of intervals is larger than the maximum value of the first number of intervals, and the minimum value of the third buffer number of intervals is larger than the maximum value of the second buffer number of intervals. The third buffer amount interval may include one or more values, for example, may include 11, 12, 13, 14, 15, … …, 20, and so on.

For example, in a live scene, if only the sharpness needs to be improved, the number of buffered frames may be determined to be a larger value, such as 20, of the third number of intervals. If it is desired to properly reduce video latency on the basis of improving sharpness, the number of reference frames can be determined to be a smaller value, such as 15, for the second number of intervals. By properly reducing the requirements for definition, the requirements for latency can be properly raised, thereby achieving trade-offs.

For example, the target scene is a live link scene, in order to reduce latency, the number of reference frames may be 1, the fractional pixel search precision may be 1, and the number of buffered frames may be 1. The fractional pixel searching precision of 1 can indicate that searching is performed according to each 1/2 pixel unit, and the smaller the number of reference frames and the number of buffer frames is, the faster the video coding speed is, so that the video coding efficiency can be improved, and the delay performance is reduced.

Or, for example, the target scene is a common live broadcast scene, in order to improve the fluency, the number of reference frames may be 1, the fractional pixel search precision may be 2, and the number of buffered frames may be 10. The fractional pixel search precision may be 2, which may indicate that searching is performed according to every 1/4 pixel unit, so that the description of motion may be finer, the larger the number of buffered frames may decrease video coding efficiency, but the smaller the number of reference frames, the larger the number of buffered frames may increase fluency.

Or, for example, the target scene is a hot live scene, in order to improve the definition, the number of reference frames may be 3, the fractional pixel search precision may be 2, and the number of buffered frames may be 20. The larger the number of buffered frames, the lower the video coding efficiency, but the finer the coding, the larger the number of reference frames, which means that the more video frames are referenced to describe the image changes in the video frames, which may improve sharpness.

In the embodiment of the application, different video coding parameters can be determined for the image groups in different scenes, so that targeted video coding processing of the image groups in different scenes can be realized, and the video coding effect is improved.

Referring to fig. 4, fig. 4 is a schematic diagram of a video frame provided in the embodiment of the present application, in which a white box in fig. 4 represents a current frame, that is, a video frame currently being encoded, gray boxes respectively represent consecutive video frames before and after the current frame, and arrows corresponding to the white boxes point to reference frames of the current frame. If the current frame is a key frame, no reference to other video frames is needed when video encoding the current frame. If the current frame is a bidirectional reference frame, when the current frame is video-coded, the gray box pointed by the arrow is the reference frame, that is, the previous video frame and the next video frame can be referred to, as shown in 4a in fig. 4, and the reference frames of the current frame can include 3 reference frames. If the current frame is a forward reference frame, when the current frame is video-coded, the gray box pointed by the arrow is the reference frame, that is, the previous video frame can be referred to, as shown in fig. 4b, and the reference frames of the current frame may include 1. As shown in fig. 4c, when the fractional pixel search precision is 1, it means that the search is performed by shifting 1/2 pixel unit at a time when searching the image block most similar to the current image block. As shown in fig. 4d, the number of buffered frames may be 8, and the buffered frames include a current frame and consecutive video frames preceding and following the current frame.

S103, respectively acquiring types of a plurality of video frames, and respectively carrying out video coding processing on the plurality of video frames contained in the image group based on the video coding parameters of the image group and the types of the plurality of video frames to obtain coded data of the plurality of video frames contained in the image group.

In this embodiment of the present invention, since the image group includes a plurality of video frames, and each video frame is different in type, the video encoding mode of each video frame is different, and when video encoding processing is performed on each video frame, encoding is performed by combining the video encoding parameter and the video encoding mode of each video frame, so that by respectively acquiring the types of the plurality of video frames, based on the video encoding parameter and the types of the plurality of video frames of the image group, video encoding processing is performed on the plurality of video frames included in the image group, and encoded data of the plurality of video frames included in the image group can be obtained.

The type of the video frame may indicate whether the video frame needs to be encoded with reference to other video frames in the group of pictures when encoded. The types of video frames may include, for example, a key frame type, a forward reference frame type, and a bi-directional reference frame type, each type of video frame having a different video encoding scheme. Video frames of the key frame type may be referred to as key frames, video frames of the forward reference frame type may be referred to as forward reference frames, and video frames of the bi-directional reference frame type may be referred to as bi-directional reference frames.

For example, key frames when video encoding, image data in the key frames may be encoded directly. Forward reference frames when video encoding, the previous video frames may be referenced for encoding. The bi-directional reference frame may be encoded with reference to both the preceding video frame and the following video frame. For example, the video coding mode of the key frame can be coding by adopting intra-frame compression technology. The video coding mode of the forward reference frame can be coding by adopting an inter-frame compression technology. The video coding mode of the bidirectional reference frame can adopt an inter-frame compression technology for coding. It will be appreciated that the type of the first video frame in a group of pictures is typically a key frame type.

In one embodiment, aiming at different coding modes of different types of video frames, if the type of the video frame in the image group is a key frame type, the coding mode corresponding to the key frame is obtained, and video coding processing is performed on the key frame in the image group by adopting the coding mode corresponding to the key frame and the video coding parameters of the image group, so as to obtain the coding data of the key frame in the image group.

Further, if the type of the video frame in the image group is the forward reference frame type, acquiring a coding mode corresponding to the forward reference frame, and performing video coding processing on the forward reference frame in the image group by adopting the coding mode corresponding to the forward reference frame and the video coding parameters of the image group to obtain the coded data of the forward reference frame in the image group.

Further, if the type of the video frame in the image group is the bidirectional reference frame type, acquiring a coding mode corresponding to the bidirectional reference frame, and performing video coding processing on the bidirectional reference frame in the image group by adopting the coding mode corresponding to the bidirectional reference frame and the video coding parameters of the image group to obtain coded data of the bidirectional reference frame in the image group.

Since the scenes to which the plurality of video frames included in one image group belong are generally the same, the video encoding parameters of the plurality of video frames in one image group are the same, but since the types of the plurality of video frames in one image group are different, the video encoding processing can be performed on each video frame by adopting the same video encoding parameters and the video encoding modes corresponding to the video frames of various types, so as to obtain corresponding encoded data.

Alternatively, the key frame may be referred to as an I frame (Intra picture), the forward reference frame may be referred to as a P frame (Predictive-frame), and the Bi-directional reference frame may be referred to as a B frame (Bi-directional interpolated prediction frame), for example. Wherein, the I frame is encoded and then has a complete image picture; after P frame coding, there is no complete video picture data, only picture difference data with the previous video frame; the B-frame encoding is followed by no complete video picture data, only difference data with the picture of the previous video frame and the picture of the subsequent video frame. Thus, in performing the video decoding process, the I-frame itself may be decompressed into a single complete video picture by a video decompression algorithm. The P-frame may be decoded into a complete video picture with reference to an I-frame or P-frame preceding it. The B frame may then refer to its previous I frame or P frame and its following P frame to generate a complete video picture. Redundant information of video frames in space dimension can be removed after I frame coding, time redundant information between video frames can be removed by P frames and B frames, and the P frames are referenced forward when coded, so that residual errors are less; the B frames are referenced in the front and back directions, so that the coding efficiency is higher.

In one embodiment, each video frame may also be video encoded in combination with the code rate of each video frame in the group of pictures. Specifically, prediction processing can be performed on a plurality of video frames contained in the image group, so as to obtain complexity of the plurality of video frames contained in the image group; determining code rates of a plurality of video frames contained in the image group based on the complexity of the plurality of video frames contained in the image group; and respectively carrying out video coding processing on the plurality of video frames contained in the image group based on the video coding parameters of the image group, the types of the plurality of video frames contained in the image group and the code rates of the plurality of video frames to obtain coded data of the plurality of video frames contained in the image group.

The complexity of the video frame can be used for indicating the number of image information contained in the video frame, and the higher the complexity of the video frame is, the more the image information contained in the video frame is, the higher code rate can be allocated to the video frame for video coding, so that the accuracy of coding can be improved, and the coding is finer. For video frames with lower complexity, lower code rate can be allocated, and video coding efficiency can be improved.

In a specific implementation, when the buffered frames are acquired, complexity of each frame in the buffered frames may be estimated in advance, so as to determine the complexity of each buffered frame, and subsequently, when video encoding is performed on each image group, encoding may be performed in combination with the complexity of each video frame in the image group. The complexity of each video frame may be determined, for example, by calculating the similarity between adjacent ones of the plurality of video frames. Alternatively, for example, a complexity calculation algorithm may be used to predict a plurality of video frames included in the image group, so as to obtain the complexity of the plurality of video frames included in the image group. For example, complexity calculation algorithms may include, but are not limited to, absolute transform difference and algorithm (Sum of Absolute Transformed Difference, SATD algorithm).

In the embodiment of the present application, for example, the complexity of each video frame may be calculated on a frame-by-frame basis, and an initial code rate is allocated to all video frames based on the complexity of each video frame in the image group. For video encoders, after setting the code rate, the encoder encodes at the code rate, but the code rate for the encoder output is calculated in seconds, e.g., how many bits per second. Therefore, it is different how many code rates are allocated to each video frame to describe each video frame, for example, 2000 megacode rates are allocated to each video frame in 1 second as a whole, and how many code rates are allocated to a plurality of video frames in 1 second respectively, so that the sum of the code rates allocated to all video frames in 1 second is equal to 2000 megacodes. Optionally, an initial code rate and quantization parameters, which are parameters describing quality lost frames at a finer level of the frame level or block level in the video encoder, may also be assigned to all video frames based on the complexity of each video frame in the group of pictures.

For example, during encoding, a plurality of video frames are first divided into groups of pictures, one of which is considered to be a video base unit. For example, if the image group is set to 32 frames, the type of video of each frame needs to be determined from 0 to 31 frames. The first frame is an I frame, which is the most important video frame and the best quality video frame, and many subsequent video frames will select the I frame for encoding. For example, in video encoding, more important video frames may be encoded first, for example, the 0 th frame may be encoded first, and then the 31 st frame may be encoded, where the 0 th frame and the 31 st frame are video frames with higher importance in the image group, so that encoding may be performed first. Then, the 16 th frame is encoded, and the 16 th frame can be encoded with reference to the first frame as a reference frame. If the scene is switched, the current frame may be encoded as a new I frame.

In one possible implementation, the frame type of the video frame in the group of pictures (GOP) may also be changed according to the GOP (group of pictures) length (i.e. the number of video frames in the group of pictures) and the external mandatory I-frame request, i.e. it is determined which frame type in I, B, P frames is the frame type in each GOP, and the number of B frames of the frame type that needs to be changed in each GOP is set. B frames are inserted into the GOP according to the number of B frames, namely, which P frames in the GOP are changed into B frames, and the B frames are inserted into the GOP. For example 3B frames of a GOP setting, then it is selected at which positions in the group of pictures 3 frames are inserted, i.e. which P frames in the group of pictures are changed. And changing the frame type according to scene switching detection, namely when the scene switching in the live broadcast scene is detected, for example, a front camera is switched to a rear camera, or a new product is added to cause the whole live broadcast picture to be changed, changing the current frame into an I frame, and being convenient for being used as a reference frame in video coding. By changing the P-frame or B-frame to an I-frame in the group of pictures, or inserting the B-frame in the group of pictures, the video quality can be made clearer and the coding efficiency can be made higher. The total number of video frames in the GOP after the frame type is changed is unchanged. After the frame type is changed, when video coding is carried out subsequently, the video coding mode corresponding to the changed frame is also changed, so that the video effect obtained by coding is better.

In one embodiment, after video encoding processing is performed in combination with a scene to obtain encoded data, the encoded data of a plurality of video frames included in the image group may be transmitted to the client, so that the client performs video decoding processing on the encoded data of the plurality of video frames included in the image group to obtain decoded data, performs rendering processing on the decoded data, and outputs the decoded data after rendering processing. Wherein the client may comprise a viewer side or a host side.

In the embodiment of the application, the video encoding processing is performed on the plurality of video frames by combining with a specific live broadcast scene, when the encoded data of the plurality of video frames are transmitted to the client, the client can decode the video encoded data by using a video decoding mode corresponding to the video encoding mode to obtain decoded data, so that the decoded data can be rendered, and the decoded data after the rendering processing is positioned at the client, so that live broadcast host broadcasting and audience viewing of the live broadcast video data of the client can be realized, and the effect of video live broadcast can be achieved.

In an alternative implementation manner, if the number of image groups included in the video data to be processed is multiple, any two continuous image groups in the multiple image groups are a first image group and a second image group, and the first image group and the second image group belong to different target scenes; the manner of determining the video coding parameters for the group of pictures based on the target scene to which the group of pictures belongs may be as follows:

Determining video coding parameters for the first image group based on a target scene to which the first image group belongs; determining video coding parameters for the second group of images based on a target scene to which the second group of images belongs; the video encoding parameters of the second group of pictures are different from the video encoding parameters of the first group of pictures.

For example, when the acquired video data to be processed just includes video data of two live broadcast scenes, corresponding video coding parameters can be selected for the video data of each live broadcast scene respectively to perform corresponding video coding processing, rather than performing video coding processing by adopting the same video processing mode and video coding parameters, so that corresponding video processing effects can be improved.

The technical scheme in the embodiment of the application is applied to live broadcast service, and stability, definition, fluency and low-latency performance of the live broadcast service are to be ensured, for example, technical indexes and client indexes are stable and normal, the technical indexes can comprise a click-through rate, the client indexes can comprise subjective experience and retention rate of a user, and the like. By introducing a self-grinding soft coding mode, namely, setting different video coding parameters in combination with a scene to carry out coding, the live broadcast image quality can be improved. And the modification of coding parameters can be supported by adopting a soft coding mode, the code rate can be saved, and a specific live broadcast scene is subdivided under a fixed code rate, so that better coding performance and coding effect can be achieved. The video coding processing is carried out by using different video coding parameters aiming at different service scenes, and on the premise that the total coding code rate of the whole image group is unchanged, different coding quality is obtained for a plurality of video frames in the image group, so that the experience of a spectator terminal is improved. Different video coding parameters are selected aiming at different live broadcast scenes, so that the image quality, the fluency or the time delay of the whole live broadcast large disc can be improved, and the subjective experience of a viewer end is improved. And distinguishing specific live scenes in video live broadcasting, and adaptively adjusting video coding parameters in different scenes to ensure that the video coding effect is efficient and meets expectations. Different video coding parameters are selected aiming at different live broadcast scenes, so that the image quality, the fluency or the time delay of the whole live broadcast large disc can be improved, and the subjective experience of a viewer end is improved.

In the embodiment of the application, different parameters are selected by a self-grinding soft-editing mode aiming at different live broadcasting scenes, so that the targeted live broadcasting effect can be improved. For example, for a live video scene, the delay of live video can be reduced and the viewing experience of video can be improved by selecting corresponding video coding parameters to perform video coding. Aiming at hot live broadcast, such as hot anchor sales, a plurality of audiences do not have a wheat linking request, and video quality can be improved by selecting corresponding video coding parameters to code video data, so that image quality is improved. Aiming at the common live broadcast, under the condition that no wheat connecting request exists and the number of audiences is small, such as game live broadcast or sports live broadcast, the video fluency can be improved by selecting the corresponding video coding parameters for coding processing. Therefore, aiming at each live broadcast scene, the video coding processing needs to be carried out in a targeted mode, and the subjective experience of a user is improved.

Further, referring to fig. 5, fig. 5 is a flowchart of another video encoding method according to an embodiment of the present application; as shown in fig. 5, the video encoding method may be applied to a video processing system, where the video processing system is used for performing video encoding processing and video playing on video data, and the video processing system includes a server and a client, and the video encoding method includes, but is not limited to, the following steps:

s201, the server acquires a first image group contained in the first video data and a target scene to which the first image group belongs.

In this embodiment of the present application, the first video data may be the same as the video data to be processed, or the first video data may be part of the video data in the video data to be processed. For example, the first video data may refer to video data acquired after the anchor end opens, and when determining a scene to which the first video data belongs, video encoding processing may be performed in combination with the scene and the scene may be displayed.

In one implementation, the target scene to which the group of images belongs may be determined by the following. For example, scene indication information may be acquired; the scene indication information is used for indicating a scene to which the image group belongs; determining a scene indicated by the scene indication information as a target scene to which the image group belongs; the scene indication information is obtained by the service end through detecting service data.

That is, in the live broadcast process, the service end may be used to detect service data, determine a current scene based on the detected service data, and generate scene indicating information for indicating the current scene, which scene the service end detects as the current scene. By transmitting the scene indicating information to the server, the server can determine the scene indicated by the scene indicating information as the target scene to which the image group belongs. The service data may be used to indicate a service type of the current live broadcast room, and then a live broadcast scene corresponding to the service type may be determined as a scene corresponding to the service data. The service end can rapidly detect the live broadcast scene corresponding to the service data, so as to determine the target scene to which the image group belongs.

Optionally, the service end can detect service data in the whole live broadcast process and determine the current scene, so that the accuracy of scene detection is improved and the timeliness of scene detection is ensured. Or the service end can also detect the service data once every target time to determine the current scene. Or, the service end can also detect the service data to determine the current scene when receiving the scene detection instruction, thereby saving the cost.

In another implementation, the target scene to which the group of images belongs may be determined by image detection. For example, image recognition may be performed on a plurality of video frames in the image group, respectively, to obtain scene recognition results for the plurality of video frames in the image group; the scene recognition result of each video frame is used for indicating the scene to which each video frame belongs; and determining a target scene to which the image group belongs based on scene recognition results of the plurality of video frames.

Wherein, by carrying out image recognition on each video frame in the image group, the scene recognition result of each video frame can be determined, thereby determining the target scene to which the image group belongs based on the scene recognition results of a plurality of video frames. Alternatively, for example, if the number of video frames belonging to the first scene is greater than the number of video frames belonging to the second scene in the plurality of video frames included in the image group, it may be determined that the target scene to which the image group belongs is the first scene, thereby determining the scene to which the image group belongs. Or if the first number of video frames in the image group belong to the first scene and the second number of video frames belong to the second scene, it may be determined that the target scene to which the image group belongs includes the first scene and the second scene. Subsequently, when determining the video coding parameters of the image group, the video coding parameters of the first number of video frames in the image group and the video coding parameters of the second number of video frames in the image group can be determined respectively, so that video coding processing is performed on the plurality of video frames in the image group based on the respective video coding parameters.

In an alternative implementation manner, a scene recognition model may be used to perform image recognition on a plurality of video frames in the image group, so as to obtain a scene recognition result for the plurality of video frames in the image group. The scene recognition model may include, for example, but is not limited to, convolutional neural network models (Convolutional Neural Network, CNN), recurrent neural network models (Recurrent neural network, RNN), deep neural network models (Deep Neural Networks, DNN), and so on.

In an alternative implementation manner, a plurality of training samples and a sample tag training scene recognition model may be obtained in advance, for example, the training samples may refer to a plurality of sample video frames contained in a plurality of sample image groups, and the sample tag may be used to indicate a sample scene to which each sample video frame in the plurality of sample video frames actually belongs. When the scene recognition model is trained by using the training sample, the scene recognition model is adjusted based on the difference between the sample scene and the sample label, which are the sample video frames output by the scene recognition model, so that the trained scene recognition model has the capability of recognizing the scene to which the video frames belong, and therefore, the trained scene recognition model can be used for carrying out image recognition on a plurality of video frames in the image group, and a scene recognition result aiming at the plurality of video frames in the image group is obtained.

In this embodiment of the present application, the manner of determining the target scene to which the first image group included in the first video data belongs may refer to the determination of the target scene to which the image group included in the video data to be processed belongs, which is not described herein. The target scene of the first image group is acquired, so that the target video coding processing is conveniently carried out by combining the target scene of the first image group, and the video coding effect is improved.

S202, the server determines video coding parameters for the first image group based on the target scene to which the first image group belongs.

S203, the server acquires types of a plurality of video frames contained in the first image group respectively, and performs video encoding processing on the plurality of video frames contained in the first image group based on the video encoding parameters of the first image group and the types of the video frames contained in the plurality of first image groups respectively to obtain encoded data of the plurality of video frames contained in the first image group.

S204, the server transmits the coded data of the plurality of video frames contained in the first image group to the client.

S205, the client performs video decoding processing on the encoded data of the plurality of video frames included in the first image group to obtain first decoded data, performs rendering processing on the first decoded data, and outputs the first decoded data after rendering processing.

In this embodiment, the specific implementation manner of step S201 to step S205 may refer to the implementation manner of step S101 to step S103, and will not be described herein.

S206, if the second video data is acquired, and the target scene to which the second image group included in the second video data belongs is different from the target scene to which the first image group belongs, the server determines video coding parameters for the second image group based on the target scene to which the second image group belongs.

The second video data may be the same as the video data to be processed, or the second video data may be part of the video data in the video data to be processed, where the second video data includes a second image group that belongs to a target scene different from the target scene to which the first image group belongs. For example, the first video data and the second video data belong to video data of the same field video live in different time periods. For example, the acquisition period of the first video data may be earlier than the data acquisition period of the second video data. For example, after the anchor end opens, first video data is acquired, and video coding processing is performed and displayed in combination with the scene. And after a period of time, acquiring second video data, and carrying out video coding processing and displaying by combining the scenes. For example, the first video data may refer to video data acquired in a live-link scene, and the second video data may refer to video data acquired in a normal live-link scene.

In this embodiment of the present application, the manner of determining the target scene to which the second image group included in the second video data belongs may refer to the determination of the target scene to which the image group included in the video data to be processed belongs, which is not described herein. The target scene of the second image group is acquired, so that the target video coding processing is conveniently carried out by combining the target scene of the second image group, and the video coding effect is improved. Further, by acquiring video data of different time periods under the live broadcast scene, the scene identification can be performed on the video data of different time periods, whether the live broadcast scene changes or not is determined, if the live broadcast scene changes, video coding processing can be performed by adopting video coding parameters under the corresponding scene, and video coding processing can be performed by adopting video coding parameters before the live broadcast scene changes instead of fixedly, so that video coding effect can be improved.

S207, the server acquires the types of the plurality of video frames contained in the second image group respectively, and performs video encoding processing on the plurality of video frames contained in the second image group based on the video encoding parameters of the second image group and the types of the plurality of video frames contained in the second image group respectively to obtain encoded data of the plurality of video frames contained in the second image group.

S208, the server transmits the encoded data of the plurality of video frames included in the second image group to the client.

S209, the client performs video decoding processing on the encoded data of the plurality of video frames included in the second image group to obtain second decoded data, performs rendering processing on the second decoded data, and outputs the second decoded data after the rendering processing.

In this embodiment, the specific implementation manners of step S206 to step S209 may refer to the implementation manners of step S101 to step S103, and are not described herein again.

In the embodiment of the application, when the live scene change is determined, the video coding parameters of the corresponding scene are selected for coding, so that the corresponding video coding parameters can be dynamically adjusted according to the dynamic change of the live scene, and the video coding effect is improved. Different video coding parameters are selected aiming at different live broadcast scenes, so that the image quality, the fluency or the time delay of the whole live broadcast large disc can be improved, and the subjective experience of a viewer end is improved. And distinguishing specific live scenes in video live broadcasting, and adaptively adjusting video coding parameters in different scenes to ensure that the video coding effect is efficient and meets expectations.

Having described the methods of embodiments of the present application, the apparatus of embodiments of the present application are described below.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a video encoding apparatus according to an embodiment of the present application, where the video encoding apparatus may be deployed on a computer device; the video encoding device may be configured to perform corresponding steps in the video encoding method provided in the embodiments of the present application, where the video encoding device 60 includes:

a data acquisition unit 601, configured to acquire an image group included in video data to be processed, and a target scene to which the image group belongs; the image group comprises a plurality of video frames;

a parameter determining unit 602, configured to determine video coding parameters for the image group based on a target scene to which the image group belongs; the video coding parameters of the image group are used for carrying out video coding processing on a plurality of video frames contained in the image group;

the video encoding unit 603 is configured to obtain types of the plurality of video frames, and perform video encoding processing on the plurality of video frames included in the image group based on the video encoding parameters of the image group and the types of the plurality of video frames, so as to obtain encoded data of the plurality of video frames included in the image group.

Optionally, the video coding parameters of the group of pictures include at least one of:

Referring to the number of frames, searching the fractional pixels for precision, and caching the number of frames;

the reference frame number refers to the number of video frames required to be referenced for encoding each video frame contained in the image group, the fractional pixel search precision is used for searching reference image blocks matched with target image blocks in each video frame in the reference frames, the reference image blocks refer to image blocks, in which the similarity between the reference frames and pixels of the target image blocks is greater than a similarity threshold, the buffer frame number refers to the number of buffered video frames, and the buffer frame number is used for distributing the code rate of each video frame in the image group.

Optionally, the target scene to which the image group belongs includes a first scene, a second scene or a third scene, where the first scene is a scene that needs to be subjected to live broadcasting, the second scene is a scene that does not need to be subjected to live broadcasting and the number of live broadcasting audiences is less than or equal to a preset number, and the third scene is a scene that does not need to be subjected to live broadcasting and the number of live broadcasting audiences is greater than the preset number; the parameter determining unit 602 is specifically configured to:

if the target scene to which the image group belongs is a first scene, determining that the reference frame number belongs to a first number interval, the fractional pixel searching precision belongs to a first searching precision interval, and the buffer frame number belongs to a first buffer number interval;

If the target scene to which the image group belongs is a second scene, determining that the reference frame number belongs to a first number interval, the fractional pixel searching precision belongs to a second searching precision interval, and the buffer frame number belongs to a second buffer number interval; the minimum value of the second search precision interval is larger than the maximum value of the first search precision interval, and the minimum value of the second buffer quantity interval is larger than the maximum value of the first buffer quantity interval;

if the target scene to which the image group belongs is a third scene, determining that the number of the reference frames belongs to a second number interval, the fractional pixel searching precision belongs to a second searching precision interval, and the number of the cache frames belongs to a third cache number interval; the minimum value of the second number of intervals is larger than the maximum value of the first number of intervals, and the minimum value of the third buffer number of intervals is larger than the maximum value of the second buffer number of intervals.

Optionally, the types of the video frames include a key frame type, a forward reference frame type and a bidirectional reference frame type, and the video coding modes of the video frames of each type are different; the video encoding unit 603 is specifically configured to:

if the type of the video frame in the image group is the key frame type, acquiring a coding mode corresponding to the key frame, and performing video coding processing on the key frame in the image group by adopting the coding mode corresponding to the key frame and the video coding parameters of the image group to obtain the coding data of the key frame in the image group;

If the type of the video frame in the image group is the forward reference frame type, acquiring a coding mode corresponding to the forward reference frame, and performing video coding processing on the forward reference frame in the image group by adopting the coding mode corresponding to the forward reference frame and video coding parameters of the image group to obtain coded data of the forward reference frame in the image group;

if the type of the video frame in the image group is the type of the bidirectional reference frame, acquiring a coding mode corresponding to the bidirectional reference frame, and performing video coding processing on the bidirectional reference frame in the image group by adopting the coding mode corresponding to the bidirectional reference frame and the video coding parameters of the image group to obtain the coding data of the bidirectional reference frame in the image group.

Optionally, the video encoding unit 603 is specifically configured to:

performing prediction processing on a plurality of video frames contained in the image group to obtain the complexity of the plurality of video frames contained in the image group;

determining code rates of a plurality of video frames contained in the image group based on the complexity of the plurality of video frames contained in the image group;

and respectively carrying out video coding processing on the plurality of video frames contained in the image group based on the video coding parameters of the image group, the types of the plurality of video frames contained in the image group and the code rates of the plurality of video frames to obtain coded data of the plurality of video frames contained in the image group.

Optionally, the video data to be processed includes a plurality of image groups, any two consecutive image groups in the plurality of image groups are a first image group and a second image group, and the first image group and the second image group are different in target scene; the parameter determining unit 602 is specifically configured to:

determining video coding parameters for the first image group based on a target scene to which the first image group belongs;

determining video coding parameters for the second image group based on a target scene to which the second image group belongs; the video encoding parameters of the second group of pictures are different from the video encoding parameters of the first group of pictures.

Optionally, the data acquisition unit 601 is specifically configured to:

acquiring scene indication information; the scene indication information is used for indicating a scene to which the image group belongs;

determining the scene indicated by the scene indication information as a target scene to which the image group belongs; the scene indication information is obtained by the service end through detecting service data.

Optionally, the data acquisition unit 601 is specifically configured to:

respectively carrying out image recognition on a plurality of video frames in the image group to obtain scene recognition results aiming at the plurality of video frames in the image group; the scene recognition result of each video frame is used for indicating the scene to which each video frame belongs;

And determining a target scene to which the image group belongs based on scene recognition results of the plurality of video frames.

Optionally, the video encoding device 60 further includes a data transmitting unit 604, where the data transmitting unit 604 is configured to:

transmitting the encoded data of the plurality of video frames contained in the image group to a client, enabling the client to perform video decoding processing on the encoded data of the plurality of video frames contained in the image group to obtain decoded data, performing rendering processing on the decoded data, and outputting the decoded data after the rendering processing.

It should be noted that, in the embodiment corresponding to fig. 6, the content not mentioned may be referred to the description of the method embodiment, and will not be repeated here.

Referring to fig. 7, fig. 7 is a schematic diagram of a composition structure of a computer device according to an embodiment of the present application. As shown in fig. 7, the above-described computer device 70 may include: processor 701, network interface 704 and memory 705, and in addition, the computer device 70 may further comprise: a user interface 703 and at least one communication bus 702. Wherein the communication bus 702 is used to enable connected communications between these components. The user interface 703 may include a Display screen (Display), a Keyboard (Keyboard), and the optional user interface 703 may further include a standard wired interface, a wireless interface, among others. The network interface 704 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 705 may be a high-speed RAM memory or a nonvolatile memory (non-volatile memory), such as at least one disk memory. The memory 705 may also optionally be at least one storage device located remotely from the processor 701. As shown in fig. 7, an operating system, a network communication module, a user interface module, and a device control application program may be included in the memory 705, which is one type of computer-readable storage medium.

In the computer device 70 shown in FIG. 7, the network interface 704 may provide network communication functions; while the user interface 703 is primarily used as an interface for providing input to a user; and processor 701 may be configured to invoke a device control application stored in memory 705 to implement:

It should be understood that the computer device 70 described in the embodiment of the present application may perform the description of the video encoding method described above in the embodiment corresponding to fig. 2 and 5, and may also perform the description of the video encoding apparatus described above in the embodiment corresponding to fig. 6, which are not repeated herein. In addition, the description of the beneficial effects of the same method is omitted.

The present application also provides a computer readable storage medium storing a computer program comprising program instructions which, when executed by a computer, cause the computer to perform a method as in the previous embodiments, the computer being part of a computer device as mentioned above. Such as the processor 701 described above. As an example, the program instructions may be executed on one computer device or on multiple computer devices located at one site, or alternatively, on multiple computer devices distributed across multiple sites and interconnected by a communication network, which may constitute a blockchain network.

Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, may include processes of the embodiments of the methods as described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random-access Memory (Random Access Memory, RAM), or the like.

The foregoing disclosure is only illustrative of the preferred embodiments of the present application and is not intended to limit the scope of the claims herein, as the equivalent of the claims herein shall be construed to fall within the scope of the claims herein.

Claims

1. A method of video encoding, the method comprising:

and respectively acquiring types of the plurality of video frames, and respectively carrying out video coding processing on the plurality of video frames contained in the image group based on the video coding parameters of the image group and the types of the plurality of video frames to obtain coded data of the plurality of video frames contained in the image group.

2. The method of claim 1, wherein the video coding parameters of the group of pictures comprise at least one of:

The reference frame number refers to the number of video frames required to be referred to for encoding each video frame contained in the image group, the fractional pixel search precision is used for searching reference image blocks matched with the target image blocks in each video frame in the reference frames, the reference image blocks refer to image blocks, with similarity between the reference frames and pixels of the target image blocks being greater than a similarity threshold, the buffer frame number refers to the number of buffered video frames, and the buffer frame number is used for distributing code rates of each video frame in the image group.

3. The method according to claim 2, wherein the target scene to which the image group belongs includes a first scene, a second scene or a third scene, the first scene is a scene in which a live broadcast is required, the second scene is a scene in which a live broadcast is not required and the number of live viewers is less than or equal to a preset number, and the third scene is a scene in which a live broadcast is not required and the number of live viewers is greater than the preset number;

the determining video coding parameters for the image group based on the target scene to which the image group belongs includes:

if the target scene to which the image group belongs is a third scene, determining that the reference frame number belongs to a second number interval, the fractional pixel searching precision belongs to a second searching precision interval, and the buffer frame number belongs to a third buffer number interval; the minimum value of the second number of intervals is larger than the maximum value of the first number of intervals, and the minimum value of the third buffer memory number of intervals is larger than the maximum value of the second buffer memory number of intervals.

4. The method of claim 1, wherein the types of video frames include a key frame type, a forward reference frame type, and a bi-directional reference frame type, each type of video frame having a different video encoding scheme;

the video encoding processing is performed on the plurality of video frames contained in the image group based on the video encoding parameters of the image group and the types of the plurality of video frames, so as to obtain encoded data of the plurality of video frames contained in the image group, including:

if the type of the video frame in the image group is a key frame type, acquiring a coding mode corresponding to the key frame, and performing video coding processing on the key frame in the image group by adopting the coding mode corresponding to the key frame and the video coding parameters of the image group to obtain the coding data of the key frame in the image group;

if the type of the video frame in the image group is the forward reference frame type, acquiring a coding mode corresponding to the forward reference frame, and performing video coding processing on the forward reference frame in the image group by adopting the coding mode corresponding to the forward reference frame and the video coding parameters of the image group to obtain coded data of the forward reference frame in the image group;

And if the type of the video frame in the image group is the bidirectional reference frame type, acquiring a coding mode corresponding to the bidirectional reference frame, and performing video coding processing on the bidirectional reference frame in the image group by adopting the coding mode corresponding to the bidirectional reference frame and the video coding parameters of the image group to obtain the coding data of the bidirectional reference frame in the image group.

5. The method according to any one of claims 1-4, wherein the performing video encoding processing on the plurality of video frames included in the image group based on the video encoding parameters of the image group and the types of the plurality of video frames to obtain encoded data of the plurality of video frames included in the image group includes:

6. The method according to any one of claims 1 to 4, wherein the video data to be processed includes a plurality of image groups, any two consecutive image groups in the plurality of image groups are a first image group and a second image group, and the first image group and the second image group are different in target scene;

7. The method according to any one of claims 1-4, wherein said acquiring a target scene to which said group of images belongs comprises:

acquiring scene indication information; the scene indication information is used for indicating the scene to which the image group belongs, and is obtained by a service end through detecting service data;

and determining the scene indicated by the scene indication information as a target scene to which the image group belongs.

8. The method according to any one of claims 1-4, wherein said acquiring a target scene to which said group of images belongs comprises:

9. A video encoding device, the device comprising:

a parameter determining unit configured to determine video encoding parameters for the image group based on a target scene to which the image group belongs; the video coding parameters of the image group are used for carrying out video coding processing on a plurality of video frames contained in the image group;

10. A computer device, comprising: a processor, a memory, and a network interface;

the processor is connected to the memory, the network interface for providing data communication functions, the memory for storing program code, the processor for invoking the program code to cause the computer device to perform the method of any of claims 1-8.

11. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program adapted to be loaded and executed by a processor to cause a computer device having the processor to perform the method of any of claims 1-8.

12. A computer program product comprising computer instructions which, when executed by a processor, implement the method of any one of claims 1-8.