WO2020211422A1

WO2020211422A1 - Video processing method and apparatus, and device

Info

Publication number: WO2020211422A1
Application number: PCT/CN2019/126757
Authority: WO
Inventors: 卢艺帆
Original assignee: 北京字节跳动网络技术有限公司
Priority date: 2019-04-16
Filing date: 2019-12-19
Publication date: 2020-10-22
Also published as: CN109889893A

Abstract

Provided in embodiments of the present disclosure are a video processing method and apparatus, and a device; the method comprises: acquiring N continuous frames of an image from within a video, each frame of the image comprising a first object, and N being an integer greater than one; determining the posture type of the first objects in each frame of the image, and according to the posture type of the first objects in each frame of the image, determining the posture distribution of the first objects, the posture distribution being used to indicate a change rule for the posture of the first objects; and according to the posture distribution of the first objects and the N frames of the image, adding a special effect in the video. The accuracy of adding a special effect in a video is improved.

Description

Video processing method, device and equipment

Cross references to related applications

This application claims the priority of the Chinese patent application filed on April 16, 2019 with the application number 201910304462.5 and the title of the invention "video processing method, device and equipment". The full text of this application is incorporated into this application by reference.

Technical field

The embodiments of the present disclosure relate to the field of computer technology, and in particular to a video processing method, device, and equipment.

Background technique

In order to improve the video playback effect, special effects can be added to the video. For example, special effects can include adding light flicker, adding preset sounds, and so on in the video.

When adding special effects to a video, the video is usually watched by humans. When it is determined that a preset action appears in the video, a special effect is associated with the playback moment corresponding to the preset action; during the video playback process, when the playback reaches that moment, The corresponding special effects are displayed in the video. For example, if you manually observe that a clapping action occurs at the 10th second of the video, a special effect is associated at the 10th second, and when the video is played to the 10th second, a special effect related to the applause is displayed. However, in the above process, special effects are added according to the playback time of the video. There may be deviations between the moment when the preset action appears in the video and the moment when the special effect corresponding to the preset action is displayed, resulting in poor accuracy of the special effects added in the video. .

Summary of the invention

The embodiments of the present disclosure provide a video processing method, device, and equipment, which improve the accuracy of special effects added in a video.

In the first aspect, embodiments of the present disclosure provide a video processing method, including:

Acquire consecutive N frames of images in the video, each frame of the image includes the first object, and the N is an integer greater than 1;

Determine the posture type of the first object in each frame of image, and determine the posture distribution of the first object according to the posture type of the first object in each frame of image, and the posture distribution is used to indicate the The law of change of the posture of the first object;

According to the posture distribution of the first object and the N frames of images, special effects are added to the video.

In a possible implementation manner, adding special effects to the video according to the posture distribution of the first object and the N frames of images includes:

Judging whether the posture distribution of the first object meets a preset posture distribution;

When the posture distribution of the first object satisfies a preset posture distribution, obtain a target special effect corresponding to the preset posture distribution, and add the target special effect to the video according to the N frames of images.

In a possible implementation manner, determining the posture distribution of the first object according to the posture type of the first object in each frame of image includes:

According to the sequence of the N frames of images in the video, the N frames of images are grouped to obtain at least two sets of images, each group of images includes consecutive M frames of images, where M is an integer greater than 1. ；

Determine the posture type corresponding to each group of images according to the posture type of the first object in each image in each group of images;

The posture distribution of the first object is determined according to the posture type corresponding to each group of images.

In a possible implementation manner, for any first image in the N frames of images, determining the posture type of the first object in the first image includes:

Detecting an object area in the first image, where the object area includes a part of the first image corresponding to the first object;

The object area is processed to determine the posture type of the first object in the first image.

In a possible implementation manner, detecting the object area in the first image includes:

The data representing the first image is input to a first recognition model to obtain the object area; wherein, the first recognition model is obtained by learning multiple groups of first samples, each group of first samples It includes a sample image and a sample object area in the sample image, and the sample image includes an image corresponding to the first object.

In a possible implementation manner, processing the object area to determine the posture type of the first object in the first image includes:

Input the data representing the object area into a second recognition model to obtain the posture type of the first object in the first image; wherein the second recognition model is learning from multiple sets of second samples Obtained, each set of second samples includes a sample object area and a sample gesture type recognized in the sample object area, and the sample object area includes an image corresponding to the first object.

In a possible implementation manner, the video is a video being shot; acquiring consecutive N frames of images in the video includes:

Acquiring N frames of to-be-processed images in the video, and the N frames of to-be-processed images include the last N frames of images that have been taken in the video;

It is determined whether each of the N frames of images to be processed includes the first object, and if so, the N frames of images to be processed are determined as the N frames of images.

In a possible implementation manner, adding the target special effect to the video according to the N frames of images includes:

Adding the special effect to the Nth frame of the N frames of images.

In a possible implementation manner, the video is a video that has been filmed; the obtaining of consecutive N frames of images in the video includes:

Performing a to-be-processed image selection operation, the to-be-processed image selection operation includes: acquiring, from a preset image of the video, consecutive N frames of to-be-processed images in the video;

Performing an operation of determining N frames of images, the operation of determining N frames of images includes: determining whether each frame of the N frames of to-be-processed images includes the image corresponding to the first object; The frame of the image to be processed is determined to be the N frames of images, and if not, the preset image is updated to a frame of image after the preset image in the video;

Repeat the operation of selecting the image to be processed and the operation of determining the N frames of images until it is determined that the N frames of images are obtained.

The special effect is added to at least one of the N frames of images.

In a possible implementation manner, the acquiring consecutive N frames of images in the video includes:

Determine the special effects to be added to the video;

Determine the first object corresponding to the special effect to be added in the video;

According to the first object, the N frames of images are determined in the video.

In a possible implementation manner, before acquiring consecutive N frames of images in the video, the method further includes:

It is determined that the target special effect is not added to the N frames of images.

In a second aspect, an embodiment of the present disclosure provides a video processing device, including an acquisition module, a first determination module, a second determination module, and an addition module, wherein:

The acquiring module is configured to acquire consecutive N frames of images in a video, each frame of the image includes a first object, and the N is an integer greater than 1;

The first determining module is configured to determine the posture type of the first object in each frame of image;

The second determining module is configured to determine the posture distribution of the first object according to the posture type of the first object in each frame of image, and the posture distribution is used to indicate a change in the posture of the first object law;

The adding module is configured to add special effects to the video according to the posture distribution of the first object and the N frames of images.

In a possible implementation manner, the adding module is specifically used for:

In a possible implementation manner, the second determining module is specifically configured to:

Obtain the posture distribution of the first object according to the posture type corresponding to each group of images.

In a possible implementation manner, for any first image in the N frames of images, the first determining module is specifically configured to:

The object area is processed to obtain the posture type of the first object in the first image.

In a possible implementation manner, the first determining module is specifically configured to:

In a possible implementation manner, the video is a video being shot; the acquisition module is specifically configured to:

Adding the special effect to the Nth frame of the N frames of images.

In a possible implementation manner, the video is a completed video; the acquisition module is specifically configured to:

Repeat the operation of selecting the image to be processed and the operation of determining the N frames of images until the N frames of images are determined to be obtained.

The special effect is added to at least one of the N frames of images.

In a possible implementation manner, the acquisition module is specifically configured to:

Determine the special effects to be added to the video;

In a possible implementation manner, the device further includes a third determining module, wherein:

The third determining module is configured to determine that the target special effect is not added to the N frames of images before the acquiring module acquires consecutive N frames of images in the video.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: a processor coupled with a memory;

The memory is used to store a computer program;

The processor is configured to execute the computer program stored in the memory, so that the terminal device executes the method according to any one of the foregoing first aspects.

In a fourth aspect, an embodiment of the present disclosure provides a readable storage medium, including a program or instruction, and when the program or instruction runs on a computer, the method described in any one of the foregoing first aspect is executed.

In the video processing method, device and equipment provided by the embodiments of the present disclosure, when it is necessary to add special effects corresponding to the first object in the video, determine the continuous N frames of images including the first object in the video, and obtain the The posture type of the first object, and the posture distribution of the first object is obtained according to the posture type of the first object in each frame of image, and special effects are added to the video according to the posture distribution of the first object and N frames of images. In the above process, the posture distribution of the first object in the video is determined by using the video frame as the unit. According to the posture distribution of the first object in the video, it can be accurately determined whether a preset action appears in the video, and then can be accurately determined. Whether to add special effects to the video. When it is determined to add special effects to the video, the special effects are added to the video based on consecutive N frames of images, that is, the special effects can be added to the video at the granularity of the video frame, which improves the accuracy of adding the special effects.

Description of the drawings

In order to more clearly explain the technical solutions in the embodiments of the present disclosure or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative labor.

FIG. 1 is an architecture diagram of video processing provided by an embodiment of the disclosure;

2 is a schematic flowchart of a video processing method provided by an embodiment of the disclosure;

3A is a schematic diagram of a video frame provided by an embodiment of the disclosure;

FIG. 3B is a schematic diagram of another video frame provided by an embodiment of the disclosure;

4A is a schematic diagram of another video frame provided by an embodiment of the disclosure;

4B is a schematic diagram of another video frame provided by an embodiment of the disclosure;

FIG. 5 is a schematic flowchart of another video processing method provided by an embodiment of the disclosure;

FIG. 6 is a schematic diagram of a video processing process provided by an embodiment of the application;

FIG. 7 is a schematic structural diagram of a video processing device provided by an embodiment of the disclosure;

FIG. 8 is a schematic structural diagram of another video processing device provided by an embodiment of the disclosure;

FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the disclosure.

detailed description

In order to make the objectives, technical solutions, and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments It is a part of the embodiments of the present invention, not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

FIG. 1 is an architecture diagram of video processing provided by an embodiment of the disclosure. When adding special effects to a video, it is usually judged whether there are preset actions in the video (for example, clapping, shaking the head, etc.). When it is determined that the preset actions appear in the video, the special effects corresponding to the preset actions are added to the video . Please refer to Figure 1, when a preset action needs to be added to the video (assuming that the preset action corresponds to the first object, that is, the preset action is performed by the first object, and the first object can be hands, legs, head, vehicles, etc.). When the special effects of, you can extract images from the video to get N continuous images (image 1, image 2, ..., image N). Each extracted image can be recognized to obtain the posture type of the first object in each image, and the posture distribution of the first object can be obtained according to the posture type of the first object in each frame of image. When the posture distribution of the object satisfies the preset distribution, it can be determined that the preset action appears in the video, and the special effect corresponding to the preset action is added to the video.

In the above process, the posture distribution of the first object in the video is determined by using the video frame as the unit. According to the posture distribution of the first object in the video, it can be accurately determined whether a preset action appears in the video, and then can be accurately determined. Whether to add special effects to the video. When it is determined to add special effects to the video, the special effects are added to the video based on consecutive N frames of images, that is, the special effects can be added to the video at the granularity of the video frame, which improves the accuracy of adding the special effects.

Hereinafter, the technical solution shown in this application will be described in detail through specific embodiments. It should be noted that the following specific embodiments can be combined with each other, and the same or similar content will not be repeated in different embodiments.

FIG. 2 is a schematic flowchart of a video processing method provided by an embodiment of the disclosure. See Figure 2. The method can include:

S201: Acquire consecutive N frames of images in the video.

The execution subject of the embodiments of the present disclosure may be an electronic device, or may be a video processing device provided in the electronic device. Optionally, the video processing device can be implemented by software, or by a combination of software and hardware.

Optionally, the electronic device can be a mobile phone, a computer, a video camera with processing functions, and other devices.

Wherein, each frame of image includes the first object, and N is an integer greater than 1.

Each frame of image includes complete video content. For example, when the video is a compressed video, the N frames of images are all key frames in the video.

Optionally, the first object may be a hand, leg, head, vehicle, airplane, etc.

Optionally, the special effect to be added to the video may be determined first, the first object corresponding to the special effect to be added to the video is determined, and N frames of images are determined in the video according to the first object. For example, when determining the first object corresponding to the special effect to be added in the video, the preset action corresponding to the special effect to be added in the video may be determined first, and the object performing the preset action is determined as the first object.

For example, suppose the special effect to be added in the video is a light special effect, and the preset action corresponding to the light special effect is a clapping action, and the object performing the clapping action is a hand. Therefore, the first object can be determined as a hand, and accordingly, determine it in the video N consecutive images of all include hands.

When the application scenarios of video processing are different, the process of determining consecutive N frames of images is also different. For example, it may include at least the following two possible application scenarios:

A possible application scenario: the video is a video being shot, that is, while the video is being shot, special effects are added to the video being shot.

In this possible application scenario, continuous N frames of images can be obtained through the following feasible implementation methods: N frames of to-be-processed images are obtained in the video, and the N frames of to-be-processed images include the last N frames of images that have been taken in the video. It is determined whether each of the N frames of images to be processed includes the first object, and if so, the N frames of images to be processed are determined as the N frames of images. If not, the N frames of to-be-processed images are not determined as the N-frames of images. After a new image is captured, N frames of to-be-processed images can be updated, and the above process is repeated until the N frames of images are determined to be obtained.

In the following, the process of determining consecutive N frames of images in this application scenario will be described in detail with reference to FIGS. 3A-3B.

FIG. 3A is a schematic diagram of a video frame provided by an embodiment of the disclosure. Suppose the first object is a hand, and N is 6. Please refer to Figure 3A. Assuming that the last frame of the image currently captured is the 80th frame, the hands are included in the 75th frame to the 80th frame image. Since the last 6 frames of the captured image (the 75th frame to the 80th frame If the images include hands, the 75th frame to the 80th frame image can be determined as 6 consecutive frames of images.

FIG. 3B is a schematic diagram of another video frame provided by an embodiment of the disclosure. Suppose the first object is a hand, and N is 6. Please refer to Figure 3B. At time T1, the last image captured is the 80th frame, where the 75-76 and 78-80 frames include the hand, and the 77th frame does not include the hand, because the last captured image If there is an image that does not include hands in the 6 frames of images, continue shooting until at time T2, the last frame of image captured is the 83rd frame, and the 78th to 83rd frames all include hands , The image from frame 78 to frame 83 is determined to be 6 consecutive frames.

Another possible application scenario: the video is a completed video, that is, special effects are added to the completed video.

In this possible application scenario, continuous N frames of images can be obtained through the following feasible implementations: perform a to-be-processed image selection operation. The to-be-processed image selection operation includes: starting from the preset image of the video, obtaining continuous images in the video N frames of images to be processed. Perform N frames of image determination operation, N frames of image determination operation includes: judging whether each frame of N frames of to-be-processed images includes the image corresponding to the first object, if so, determine N frames of to-be-processed images as N frames Image, if not, update the preset image to a frame after the preset image in the video. Repeat the operation of selecting the image to be processed and the operation of determining the N frames of images until it is determined that N frames of images are obtained.

Optionally, the preset image can be updated to a frame of image after the preset image in the video. Alternatively, the preset image may be updated to the next frame image of the second image, the second image being the last image that does not include the first object in the N frames of images to be processed.

Hereinafter, the process of determining consecutive N frames of images in this kind of application scenario will be described in detail with reference to FIGS. 4A-4B.

FIG. 4A is a schematic diagram of another video frame provided by an embodiment of the disclosure. Assuming that the first object is a hand, N is 6, and the preset image is the first frame of image. Please refer to FIG. 4A. Initially, the preset image is the first frame of image. Therefore, it is determined that the N frames of to-be-processed images are the first frame of image to the sixth frame of image. Since the third frame of the image from the 1st frame to the 6th frame does not include the hand, the preset image is updated to the second frame image, and accordingly, the N frames to be processed are updated to the second frame image to the 7th frame image. Since the hand is not included in the third frame of the second to seventh frame of image, the preset image is updated to the third frame of image, and correspondingly, the N frames to be processed are updated to the third to eighth frame of image The image in the frame image. Since the hands are not included in the third frame of images from the third to the eighth frame, the preset image is updated to the fourth frame of image, and correspondingly, the N frames to be processed are updated to the fourth to ninth frame of image Frame images, since the 4th frame to the 9th frame image all include hands, the 4th frame image to the 9th frame image are determined to be 6 consecutive frames of images.

FIG. 4B is a schematic diagram of another video frame provided by an embodiment of the disclosure. Assuming that the first object is a hand, N is 6, and the preset image is the first frame of image. Referring to FIG. 4B, initially, the preset image is the first frame of image, therefore, it is determined that the N frames of to-be-processed images are the first frame to the sixth frame of images. Since the third image in the first to sixth frames does not include hands, the second image is determined in the first to sixth frames. Since the third image does not include hands, the The third frame image is determined to be the second image. Therefore, the preset image is updated to the fourth frame image (the next frame image of the second image). Accordingly, the N frames to be processed are updated to the fourth frame image to the ninth frame image. Frame images, since the 4th frame to the 9th frame image all include hands, the 4th frame image to the 9th frame image are determined to be 6 consecutive frames of images.

Optionally, in order to avoid adding repeated special effects to the same video frame, it is determined that the obtained N frames of images are images without added target special effects (special effects to be added to the video).

S202: Determine the posture type of the first object in each frame of image.

Optionally, multiple posture types of the first object may be preset. For example, when the first object is a hand, the posture type of the hand may include: open hands, put both hands together, and make a fist. For example, when the first object is the head, the posture type of the head may include: head up, head down, left head tilted, right head tilted, and so on.

The process of obtaining the posture type of the first object in each frame of image is the same. In the following, the process of obtaining the posture type of the first object in the first image will be described.

For any first image in the N frames of images, the object area can be detected in the first image, and the object area includes the part of the first image corresponding to the first object, and the object area is processed to obtain the first image The posture type of the first object in.

Optionally, the object area can be detected in the first image by the following feasible implementation: input data representing the first image to the first recognition model to obtain the object area; wherein the first recognition model is a pair of multiple sets of The sample is obtained by learning, each group of first samples includes a sample image and a sample object area in the sample image, and the sample image includes an image corresponding to the first object.

The data input representing the first image may be the first image, a grayscale image of the first image, or the like. The object area may be a rectangular area including the first object in the first image.

Since the first recognition model is learned from a large number of first samples, the first recognition model can accurately detect the target area in the first image.

The target area can be determined based on the output of the first recognition model. The output of the first recognition model may be an image corresponding to the object area in the first image, or may be the positions (for example, coordinates) of at least two vertices of the object area in the first image. When the output of the first recognition model is two vertices of the target area, the two vertices are two vertices on a diagonal line.

Optionally, the posture type of the first object in the first image can be obtained by the following feasible implementation: input data representing the object area into the second recognition model to obtain the posture type of the first object in the first image ; Among them, the second recognition model is obtained by learning multiple sets of second samples, each set of second samples includes the sample object area and the sample posture type recognized in the sample object area, the sample object area includes the first object corresponding Image.

The data representing the object area may be an image corresponding to the object area, or the positions (for example, coordinates) of at least two vertices of the object area in the first image. When the data representing the target area are two vertices of the target area, the two vertices are two vertices on a diagonal line.

The posture type of the first object in the first image can be determined according to the output of the second recognition model. The output of the second recognition model may be characters (for example, numbers, letters, etc.) representing the type of gesture.

Since the second recognition model is learned from a large number of second samples, the posture type of the first object can be accurately determined in the object area through the second recognition model.

S203: Determine the posture distribution of the first object according to the posture type of the first object in each frame of image.

Wherein, the posture distribution of the first object is used to indicate the law of change of the posture of the first object.

For example, suppose that the first object is a hand, and N is 6. The posture types of the first object in the 6 frames of images are in order: open hands facing right, hands facing open, hands facing open, hands folded, hands folded, hands Namaste. As a result, the posture distribution of the first object can be obtained as: the hands are open to the hands together.

Optionally, in order to improve the accuracy of obtaining the posture distribution of the first object, the posture distribution of the first object may be obtained by the following feasible implementation manners: N frames of images are grouped according to the order of the N frames in the video, Obtain at least two sets of images. Each set of images includes consecutive M frames of images, where M is an integer greater than 1. According to the posture type of the first object in each image in each set of images, determine the correspondence of each set of images The posture type; according to the posture type corresponding to each group of images, the posture distribution of the first object is obtained.

Optionally, for any group of images, if the posture type of the images in the group of images greater than or equal to the first threshold is the first posture type, it is determined that the posture type corresponding to the group of images is the first posture type.

For example, assuming that M is 3 and the first threshold is 2, then when there are 2 or 3 images in a group of images, the posture type corresponding to the two-hands type is determined to be the posture type corresponding to the group of images is the two-hands type. Types of.

For example, suppose N is 9, the 9 frames of images are respectively denoted as image 1, image 2, ..., image 9, and M is 3, then the 9 frames of images are grouped and the posture type corresponding to each image group determined It can be as shown in Table 1:

Table 1

It should be noted that Table 1 merely illustrates the grouping of images in the form of examples, so that each image corresponds to the posture type.

In the above process, even if the posture type of the first object in the individual image is incorrectly recognized, the correct posture distribution of the first object can still be obtained, so that the error tolerance performance of the video processing is higher.

S204: Add special effects to the video according to the posture distribution of the first object and the N frames of images.

Optionally, it can be judged whether the posture distribution of the first object satisfies the preset posture distribution, and when the posture distribution of the first object satisfies the preset posture distribution, the target special effect corresponding to the preset posture distribution is obtained, and according to the N frames of images in the video Added target special effects in.

Optionally, when the application scenarios of video processing are different, the process of adding target special effects to the video according to N frames of images is also different.

In this application scenario, special effects can be added to the Nth frame of the N frame of image. Alternatively, a special effect is added to the video at the playback time corresponding to the Nth frame of image, and the display time of the special effect may be a preset duration.

In this application scenario, special effects can be added to at least one of the N frames of images. For example, special effects can be added to all N frames of images, that is, special effects can be added to the video between the playback moments corresponding to the N frames of images. Alternatively, special effects are added to some of the N frames of images, that is, special effects are added to the video between playback moments corresponding to the partial images of the N frames of images.

In the video processing method provided by the embodiments of the present disclosure, when a special effect corresponding to a first object needs to be added to the video, the continuous N frames of images including the first object are determined in the video, and the image of the first object in each frame is obtained. According to the posture type, the posture distribution of the first object is obtained according to the posture type of the first object in each frame of image, and special effects are added to the video according to the posture distribution of the first object and N frames of images. In the above process, the posture distribution of the first object in the video is determined by using the video frame as the unit. According to the posture distribution of the first object in the video, it can be accurately determined whether a preset action appears in the video, and then can be accurately determined. Whether to add special effects to the video. When determining to add special effects to the video, add special effects to the video based on consecutive N frames of images, that is, add special effects to the video with the video frame as the granularity, which improves the accuracy of adding special effects.

On the basis of any one of the above embodiments, the following describes the video processing method in detail through the embodiment shown in FIG. 5.

FIG. 5 is a schematic flowchart of another video processing method provided by an embodiment of the disclosure. Referring to Figure 5, the method may include:

S501: Acquire consecutive N frames of images in the video.

It should be noted that, for the execution process of S501, refer to the execution process of S202, which will not be repeated here.

S502: Group the N frames of images according to the order of the N frames of images in the video to obtain at least two sets of images.

Wherein, each group of images includes consecutive M frames of images, and M is an integer greater than 1.

Starting from the first frame of the N frame images, successively divide the consecutive M frame images into one group to obtain at least two groups of images. For example, the first frame image to the Mth frame image in the N frame images are grouped into one group, the M+1th frame image to the 2Mth frame image are grouped into one group, and so on, until the N frame images are grouped.

Optionally, N is an integer multiple of M.

S503: Determine the posture type of the first object in each image in each group of images.

It should be noted that the execution process of S503 can be referred to the execution process of S202, which will not be repeated here.

S504: Determine a posture type corresponding to each group of images according to the posture type of the first object in each image in each group of images.

For any group of images, if the posture type of the images in the group of images greater than or equal to the first threshold is the first posture type, it is determined that the posture type corresponding to the group of images is the first posture type.

S505: Determine the posture distribution of the first object according to the posture type corresponding to each group of images.

For example, if the first object is a hand, it is determined in S502 that two sets of images are obtained. Assuming that the posture type corresponding to the first set of images is that the hands are facing up, and the posture type corresponding to the second set of images is the hands together, then The posture distribution of a subject is from the hands open to the hands folded.

S506: Determine whether the posture distribution of the first object meets the preset posture distribution.

If yes, execute S507-S508.

If not, execute S501.

Optionally, if the change rule of the posture of the first object indicated by the posture distribution of the first object is the same as the change rule of the posture of the first object indicated by the preset posture distribution, it is determined whether the posture distribution of the first object Meet the preset posture distribution.

S507: Obtain a target special effect corresponding to the preset posture distribution.

Optionally, the corresponding relationship between the posture distribution and the special effect can be preset, and accordingly, the target special effect can be determined according to the preset posture distribution and the object relationship.

S508: Add target special effects to the video according to the N frames of images.

It should be noted that, for the execution process of S508, refer to the execution process of S204, which will not be repeated here.

In the embodiment shown in FIG. 5, the posture distribution of the first object in the video is determined in units of video frames. According to the posture distribution of the first object in the video, it can be accurately determined whether a preset action appears in the video, and then Can accurately determine whether to add special effects to the video. When it is determined to add special effects to the video, the special effects are added to the video based on consecutive N frames of images, that is, the special effects can be added to the video at the granularity of the video frame, which improves the accuracy of adding the special effects. Further, even if the posture type of the first object in the individual image is incorrectly recognized, the correct posture distribution of the first object can still be obtained, so that the error tolerance performance of the video processing is higher.

On the basis of any one of the foregoing embodiments, the video processing method shown in the foregoing method embodiment will be described in detail below with reference to FIG. 6 through specific examples.

FIG. 6 is a schematic diagram of a video processing process provided by an embodiment of the application. Assuming that the first object is a hand, N is 6, and the special effect to be added is flower spreading. Please refer to Fig. 6, assuming that the six images obtained are P1, P2, ..., P6.

Please refer to Fig. 6, dividing P1, P2, and P3 into a group of images, and dividing P4, P5, and P6 into a group of images. The data representing the 6 images are respectively input into the first preset model to obtain the object area in each image, wherein the object area includes the hand. Input the object regions in the 6 images into the second preset model to obtain the posture type of the hand. For example, the posture types of the determined hands are: open hands facing right, hands facing right open, hands folded, hands Namaste, Namaste, Namaste. From this, it can be determined that the posture type corresponding to the first group of images is open with both hands facing each other, and the posture type corresponding to the second group of images is folded hands. Therefore, it can be determined that the posture distribution corresponding to the first object (hand) is: If the posture distribution satisfies the preset posture grouping, the special effects of spreading flowers are added to the 6 images. Of course, you can also add special effects of sprinkling to some of the 6 images.

In the embodiment shown in FIG. 6, the posture distribution of the first object in the video is determined in units of video frames. According to the posture distribution of the first object in the video, it can be accurately determined whether a preset action appears in the video, and then Can accurately determine whether to add special effects to the video. When it is determined to add special effects to the video, the special effects are added to the video based on consecutive N frames of images, that is, the special effects can be added to the video at the granularity of the video frame, which improves the accuracy of adding the special effects. Further, even if the posture type of the first object in the individual image is incorrectly recognized, the correct posture distribution of the first object can still be obtained, so that the error tolerance performance of the video processing is higher.

FIG. 7 is a schematic structural diagram of a video processing device provided by an embodiment of the disclosure. Referring to FIG. 7, the video processing device 10 may include an acquiring module 11, a first determining module 12, a second determining module 13, and an adding module 14.

The acquiring module 11 is configured to acquire consecutive N frames of images in a video, each frame of the image includes a first object, and the N is an integer greater than 1;

The first determining module 12 is configured to determine the posture type of the first object in each frame of image;

The second determining module 13 is configured to determine the posture distribution of the first object according to the posture type of the first object in each frame of image, and the posture distribution is used to indicate the posture of the first object. Law of change

The adding module 14 is configured to add special effects to the video according to the posture distribution of the first object and the N frames of images.

The video processing device provided in the embodiments of the present disclosure can execute the technical solutions shown in the foregoing method embodiments, and the implementation principles and beneficial effects are similar, and details are not described herein again.

In a possible implementation manner, the adding module 14 is specifically configured to:

In a possible implementation manner, the second determining module 13 is specifically configured to:

In a possible implementation manner, for any first image in the N frames of images, the first determining module 12 is specifically configured to:

In a possible implementation manner, the first determining module 12 is specifically configured to:

In a possible implementation manner, the video is a video being shot; the acquisition module 11 is specifically configured to:

Adding the special effect to the Nth frame of the N frames of images.

In a possible implementation manner, the video is a completed video; the acquisition module 11 is specifically configured to:

The special effect is added to at least one of the N frames of images.

In a possible implementation manner, the obtaining module 11 is specifically configured to:

Determine the special effects to be added to the video;

FIG. 8 is a schematic structural diagram of another video processing device provided by an embodiment of the disclosure. Based on the embodiment shown in FIG. 7, referring to FIG. 8, the video processing device 10 further includes a third determining module 15, where:

The third determining module 15 is configured to determine that the target special effect is not added to the N frames of images before the acquiring module 11 acquires consecutive N frames of images in the video.

FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the disclosure. The electronic device 20 may be a terminal device or a server. Among them, terminal devices may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, personal digital assistants (Personal Digital Assistant, PDA for short), tablets (Portable Android Device, PAD for short), portable multimedia players (Portable Media Player, PMP for short), mobile terminals such as vehicle-mounted terminals (for example, vehicle navigation terminals), and fixed terminals such as digital TVs and desktop computers. The electronic device shown in FIG. 9 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.

Referring to FIG. 9, the electronic device 20 may include a processing device (such as a central processing unit, a graphics processor, etc.) 21, which may be based on a program stored in a read only memory (Read Only Memory, ROM for short) 22 or from a storage device 28 The program loaded into the random access memory (Random Access Memory, RAM for short) 23 executes various appropriate actions and processing. The RAM 23 also stores various programs and data required for the operation of the electronic device 20. The processing device 21, the ROM 22, and the RAM 23 are connected to each other through a bus 24. An input/output (I/O) interface 25 is also connected to the bus 24.

Generally, the following devices can be connected to the I/O interface 25: including input devices 26 such as touch screen, touch panel, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD) ), an output device 27 such as a speaker, a vibrator, etc.; a storage device 28 including a magnetic tape, a hard disk, etc.; and a communication device 29. The communication device 29 may allow the electronic device 20 to perform wireless or wired communication with other devices to exchange data. Although FIG. 9 shows an electronic device 20 having various devices, it should be understood that it is not required to implement or have all the illustrated devices. It may alternatively be implemented or provided with more or fewer devices.

In particular, according to an embodiment of the present disclosure, the process described above with reference to the flowchart can be implemented as a computer software program. For example, the embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network through the communication device 29, or installed from the storage device 28, or installed from the ROM 22. When the computer program is executed by the processing device 21, the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.

It should be noted that the aforementioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the two. The computer-readable storage medium may be, for example, but not limited to, an electric, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device. In the present disclosure, a computer-readable signal medium may include a data signal propagated in a baseband or as a part of a carrier wave, and a computer-readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The computer-readable signal medium may also be any computer-readable medium other than the computer-readable storage medium. The computer-readable signal medium may send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device . The program code contained on the computer-readable medium can be transmitted by any suitable medium, including but not limited to: wire, optical cable, RF (Radio Frequency), etc., or any suitable combination of the above.

The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or it may exist alone without being assembled into the electronic device.

The foregoing computer-readable medium carries one or more programs, and when the foregoing one or more programs are executed by the electronic device, the electronic device is caused to execute the method shown in the foregoing embodiment.

The computer program code used to perform the operations of the present disclosure may be written in one or more programming languages or a combination thereof. The above-mentioned programming languages include object-oriented programming languages—such as Java, Smalltalk, C++, and also conventional Procedural programming language-such as "C" language or similar programming language. The program code can be executed entirely on the user's computer, partly on the user's computer, executed as an independent software package, partly on the user's computer and partly executed on a remote computer, or entirely executed on the remote computer or server. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network-including Local Area Network (LAN) or Wide Area Network (WAN)-or it can be connected to the outside Computer (for example, using an Internet service provider to connect via the Internet).

The flowcharts and block diagrams in the accompanying drawings illustrate the possible implementation architecture, functions, and operations of the system, method, and computer program product according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram can represent a module, program segment, or part of code, and the module, program segment, or part of code contains one or more for realizing the specified logical function Executable instructions. It should also be noted that, in some alternative implementations, the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown in succession can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or operations Or it can be realized by a combination of dedicated hardware and computer instructions.

The units involved in the embodiments described in the present disclosure may be implemented in a software manner, or may be implemented in a hardware manner.

Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the embodiments of the present disclosure, not to limit them; although the embodiments of the present disclosure have been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art It should be understood that it is still possible to modify the technical solutions recorded in the foregoing embodiments, or equivalently replace some or all of the technical features; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the embodiments of the present disclosure. The scope of the program.

Claims

A video processing method, characterized by comprising:

Acquire consecutive N frames of images in the video, each frame of the image includes the first object, and the N is an integer greater than 1;

Determine the posture type of the first object in each frame of image, and determine the posture distribution of the first object according to the posture type of the first object in each frame of image, and the posture distribution is used to indicate the The law of change of the posture of the first object;

According to the posture distribution of the first object and the N frames of images, special effects are added to the video.
The method according to claim 1, wherein adding special effects to the video according to the posture distribution of the first object and the N frames of images comprises:

Judging whether the posture distribution of the first object meets a preset posture distribution;

When the posture distribution of the first object satisfies a preset posture distribution, obtain a target special effect corresponding to the preset posture distribution, and add the target special effect to the video according to the N frames of images.
The method according to claim 1 or 2, wherein the determining the posture distribution of the first object according to the posture type of the first object in each frame of image comprises:

According to the sequence of the N frames of images in the video, the N frames of images are grouped to obtain at least two sets of images, each group of images includes consecutive M frames of images, where M is an integer greater than 1. ；

Determine the posture type corresponding to each group of images according to the posture type of the first object in each image in each group of images;

The posture distribution of the first object is determined according to the posture type corresponding to each group of images.
The method according to any one of claims 1-3, wherein for any first image in the N frames of images, determining the posture type of the first object in the first image comprises :

Detecting an object area in the first image, where the object area includes a part of the first image corresponding to the first object;

The object area is processed to determine the posture type of the first object in the first image.
The method of claim 4, wherein detecting an object area in the first image comprises:

The data representing the first image is input to a first recognition model to obtain the object area; wherein, the first recognition model is obtained by learning multiple groups of first samples, each group of first samples It includes a sample image and a sample object area in the sample image, and the sample image includes an image corresponding to the first object.
The method according to claim 4 or 5, wherein processing the object area to determine the posture type of the first object in the first image comprises:

Input the data representing the object area into a second recognition model to obtain the posture type of the first object in the first image; wherein the second recognition model is learning from multiple sets of second samples Obtained, each set of second samples includes a sample object area and a sample gesture type recognized in the sample object area, and the sample object area includes an image corresponding to the first object.
The method according to claim 2, wherein the video is a video being shot; acquiring consecutive N frames of images in the video comprises:

Acquiring N frames of to-be-processed images in the video, and the N frames of to-be-processed images include the last N frames of images that have been taken in the video;

It is determined whether each of the N frames of images to be processed includes the first object, and if so, the N frames of images to be processed are determined as the N frames of images.
The method of claim 7, wherein adding the target special effect to the video according to the N frames of images comprises:

Adding the special effect to the Nth frame of the N frames of images.
The method according to claim 2, wherein the video is a completed video; and the acquiring consecutive N frames of images in the video comprises:

Performing a to-be-processed image selection operation, the to-be-processed image selection operation includes: acquiring, from a preset image of the video, consecutive N frames of to-be-processed images in the video;

Performing an operation of determining N frames of images, the operation of determining N frames of images includes: determining whether each frame of the N frames of to-be-processed images includes the image corresponding to the first object; The frame of the image to be processed is determined to be the N frames of images, and if not, the preset image is updated to a frame of image after the preset image in the video;

Repeat the operation of selecting the image to be processed and the operation of determining the N frames of images until it is determined that the N frames of images are obtained.
The method of claim 9, wherein adding the target special effect to the video according to the N frames of images comprises:

The special effect is added to at least one of the N frames of images.
The method according to any one of claims 1-10, wherein the acquiring consecutive N frames of images in a video comprises:

Determine the special effects to be added to the video;

Determine the first object corresponding to the special effect to be added in the video;

According to the first object, the N frames of images are determined in the video.
The method according to claim 2, characterized in that, before acquiring consecutive N frames of images in the video, the method further comprises:

It is determined that the target special effect is not added to the N frames of images.
A video processing device, characterized in that it comprises an acquisition module, a first determination module, a second determination module, and an addition module, wherein:

The acquiring module is configured to acquire consecutive N frames of images in a video, each frame of the image includes a first object, and the N is an integer greater than 1;

The first determining module is configured to determine the posture type of the first object in each frame of image;

The second determining module is configured to determine the posture distribution of the first object according to the posture type of the first object in each frame of image, and the posture distribution is used to indicate a change in the posture of the first object law;

The adding module is configured to add special effects to the video according to the posture distribution of the first object and the N frames of images.
An electronic device, characterized by comprising: at least one processor and a memory;

The memory stores computer execution instructions;

The at least one processor executes computer-executable instructions stored in the memory, so that the at least one processor executes the video processing method according to any one of claims 1-12.
A computer-readable storage medium, wherein the computer-readable storage medium stores computer-executable instructions, and when the processor executes the computer-executable instructions, the computer-readable storage medium implements any one of claims 1-12 Video processing method.