CN110365903B

CN110365903B - Video-based object processing method, device and equipment and readable storage medium

Info

Publication number: CN110365903B
Application number: CN201910676167.2A
Authority: CN
Inventors: 田元
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-07-25
Filing date: 2019-07-25
Publication date: 2022-11-29
Anticipated expiration: 2039-07-25
Also published as: CN110365903A

Abstract

The application discloses a video-based object processing method, device and equipment and a readable storage medium, and relates to the field of video processing. The method comprises the following steps: acquiring video stream of a video, wherein the video content of the video stream comprises a target object; carrying out key point tracking on a target object, wherein the target object corresponds to m key points for constructing a contour; receiving a key point adjusting signal; adjusting the position of the first key point according to the key point adjusting signal; and processing the contour region corresponding to the first key point on the target object according to the adjusted position of the first key point. The method comprises the steps of identifying key points of a target object in a video stream to obtain key points corresponding to the outline of the target object, and adjusting first key points on the outline by receiving key point adjusting signals, so that the whole outline of the target object is adjusted, the accuracy of beautifying the target object is improved, and the efficiency of integrally adjusting the target object is improved.

Description

Video-based object processing method, device and equipment and readable storage medium

Technical Field

The present disclosure relates to the field of video processing, and in particular, to a method, an apparatus, a device, and a readable storage medium for processing an object based on a video.

Background

Video is a multimedia presentation form containing more information content, and comprises various forms such as movies, television shows, advertisement videos, small videos and ordinary videos. Alternatively, the video may be captured with the subject as the center of imaging, or may be captured with another object as the center of imaging. When a video is shot with a subject as a shooting center, there is generally a demand for beautification of the subject.

In the related art, when beautifying the object, a beautifying mode of lengthening the leg of the object is provided, and after the client identifies the area corresponding to the leg of the object in the video, the client lengthens the leg to the length corresponding to the adjustment degree according to the selection of the leg adjustment degree of the user.

However, when the object is beautified by the above method, the leg length is adjusted, and the ratio of the leg length to the body size is inconsistent, so that the adjustment result is distorted, that is, the object is beautified by the above method with low accuracy.

Disclosure of Invention

The embodiment of the application provides a video-based object processing method, a video-based object processing device, video-based object processing equipment and a readable storage medium, and can solve the problems that after leg length is adjusted, the proportion of the leg length to the figure is inconsistent, the adjustment result is distorted, and the accuracy of beautifying an object is low. The technical scheme is as follows:

in one aspect, a video-based object processing method is provided, and the method includes:

acquiring a video stream of the video, wherein the video content of the video stream comprises a target object;

carrying out key point tracking on the target object, wherein the target object corresponds to n key points, the n key points comprise m key points used for constructing the outline of the target object, and m is more than 0 and less than or equal to n;

receiving a key point adjusting signal, wherein the key point adjusting signal is used for indicating that the contour of the target object is adjusted;

adjusting the position of a first key point in the m key points according to the key point adjusting signal, wherein the position is used for representing the relative position between the first key point and other key points;

and processing the contour region corresponding to the first key point on the target object according to the adjusted position of the first key point.

In another aspect, there is provided a video-based object processing apparatus, the apparatus comprising:

the acquisition module is used for acquiring a video stream of the video, and the video content of the video stream comprises a target object;

the identification module is used for tracking key points of the target object, the target object corresponds to n key points, the n key points comprise m key points used for constructing the outline of the target object, and m is more than 0 and less than or equal to n;

a receiving module, configured to receive a key point adjustment signal, where the key point adjustment signal is used to instruct to adjust a contour of the target object;

an adjusting module, configured to adjust a position of a first key point of the m key points according to the key point adjusting signal, where the position is used to indicate a relative position between the first key point and another key point;

the adjusting module is further configured to process a contour region corresponding to the first key point on the target object according to the adjusted position of the first key point.

In another aspect, a computer device is provided, which includes a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the video-based object processing method as provided in the embodiments of the present application.

In another aspect, there is provided a computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by a processor to implement a video-based object processing method as provided in an embodiment of the present application.

In another aspect, a computer program product is provided, which when run on a computer causes the computer to perform the video-based object processing method as described in the embodiments of the present application.

The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:

the key point identification is carried out on the target object in the video stream to obtain the key point corresponding to the outline of the target object, and the key point adjustment signal is received to adjust the first key point on the outline, so that the whole outline of the target object is adjusted, the problems of inconsistent stature proportion and low object beautifying accuracy caused by the fact that legs are elongated in the related technology are solved, the beautifying accuracy of beautifying the target object is improved, and the efficiency of integrally adjusting the target object is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram of a keypoint obtained after performing keypoint identification on a target object according to an exemplary embodiment of the present application;

FIG. 2 is a diagram illustrating a related art method for beautifying an object according to an exemplary embodiment of the present application;

FIG. 3 is a schematic illustration of an implementation environment provided by an exemplary embodiment of the present application;

FIG. 4 is a flowchart of a method for video-based object processing according to an exemplary embodiment of the present application;

FIG. 5 is a schematic diagram of deriving a contour of a target object from keypoints provided based on the embodiment shown in FIG. 4;

FIG. 6 is a flowchart of a video-based object processing method according to another exemplary embodiment of the present application;

FIG. 7 is a schematic diagram of an adjustment process for keypoints provided based on the embodiment shown in FIG. 6;

FIG. 8 is a schematic diagram of adjusting the contour of the target object according to the adjustment result of the key points provided by the embodiment shown in FIG. 6;

FIG. 9 is a flowchart of a method for video-based object processing according to another exemplary embodiment of the present application;

FIG. 10 is a schematic diagram of a facial feature adjustment method provided based on the embodiment shown in FIG. 9;

FIG. 11 is a flowchart of a video-based object processing method according to another exemplary embodiment of the present application;

FIG. 12 is a flowchart of a video-based object processing method according to another exemplary embodiment of the present application;

fig. 13 is a block diagram of a video-based object processing apparatus according to an exemplary embodiment of the present application;

fig. 14 is a block diagram of a video-based object processing apparatus according to another exemplary embodiment of the present application;

fig. 15 is a block diagram of a terminal according to an exemplary embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application more clear, the following detailed description of the embodiments of the present application will be made with reference to the accompanying drawings.

First, terms referred to in the embodiments of the present application are briefly described:

artificial Intelligence (AI): the method is a theory, method, technology and application system for simulating, extending and expanding human intelligence by using a digital computer or a machine controlled by the digital computer, sensing the environment, acquiring knowledge and obtaining the best result by using the knowledge. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the implementation method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The key points are as follows: the method includes detecting identification points for identifying characteristic parts in a process of object recognition, wherein the characteristic parts optionally include at least one of object outlines, parts of object arms, parts of object legs, and parts of human face five sense organs, and when the key points include parts of human face five sense organs, the key points are usually located around the face outlines, eyebrow eyes, nose, lips, ears, and the like. Optionally, the key points at five sense organs of the face include at least one of face contour key points, eyebrow key points, nose key points, lip key points, and ear key points, optionally, the eyebrow key points include eyebrow key points and eye key points, the eye key points include upper eyelid key points and lower eyelid key points, and the lip key points include upper lip key points and lower lip key points.

Optionally, the key point may be detected by a key point detection algorithm, such as: a Convolutional Neural Network (CNN) based keypoint regression method. Optionally, the number of the key points may also be set by a designer, each key point corresponds to a key point identifier, and two groups of key points identified in two frames of images for the same target object have a correspondence, where the correspondence is implemented according to the key point identifiers, and illustratively, the key points 1 to 90 are identified in the image a, and the key points 1 to 90 are identified in the image B, so that the key point 1 in the image a corresponds to the key point 1 in the image B, the key point 17 in the image a corresponds to the key point 17 in the image B, and the key point 81 in the image a corresponds to the key point 81 in the image B.

Optionally, the key point may be used in applications such as object beautification, face beautification, three-dimensional reconstruction, and the like, where the object beautification includes slimming, height adjustment, body curve adjustment, and the like; the face beautifying comprises face thinning, eye amplification, eyebrow adjustment and the like, and the face pendant is used for attaching the pendant around organs according to the positions of the organs, such as: the cat ear is attached above the face contour, and three-dimensional reconstruction is used for constructing a three-dimensional model according to the object contour, body parts of the object and face organs.

Referring to fig. 1, schematically, in the image 100, a target object 110 is included, the target object 110 corresponds to an arm 111, a leg 112, a waist 113, a chest 114, and a head 115, and detected key points 120 are distributed around a contour of the target object 110, where the contour includes a periphery of the arm 111, a periphery of the leg 112, a periphery of the chest 114, and a periphery of the head 115.

In the related art, when beautifying an object in a video, a beautifying manner of lengthening a leg of the object is provided, and after a client identifies an area corresponding to the leg of the object in the video, the client lengthens the leg to a length corresponding to an adjustment degree according to a selection of the leg adjustment degree by a user, referring to fig. 2 schematically, a playing interface 200 of the video is displayed in the client, the playing interface 200 includes an object 210, the object 210 keeps the whole body in the playing interface 200, the playing interface 200 further includes a special effect trigger control 220, when the user selects the special effect trigger control 220, the client lengthens the leg of the object 210 in the playing interface 200 and displays a lengthening ratio control 230, and the lengthening ratio is adjusted on the lengthening ratio control 230, so that the lengthening degree of the leg is controlled.

Optionally, an application scenario of the video-based object processing method provided in the embodiment of the present application includes at least one of the following scenarios:

firstly, the object processing method based on the video can be applied to a camera application program, the camera application program firstly collects a video stream through a camera in the shooting process, and detects key points corresponding to a target object in the video stream, so as to beautify and adjust the target object through adjusting the key points, wherein the beautification adjustment comprises a part for adjusting the outline of the target object;

secondly, the video-based object processing method can be applied to a live video application program, when a user carries out live broadcasting through the live video application program, a video stream is collected through a terminal camera, the live video application program carries out beautifying adjustment on a target object through detecting key points corresponding to the target object in the video stream and adjusting the positions of the key points in each frame of video image, wherein the beautifying adjustment comprises a part for adjusting the outline of the target object, after the terminal sends the beautified live video stream to a server, the server forwards the beautified live video stream to a watching terminal, and the watching terminal plays the beautified live video stream;

thirdly, the video-based object processing method can be applied to an instant messaging application program, a user acquires a call video stream through a camera of a terminal in the process of carrying out video call with a friend through the instant messaging application program, the instant messaging application program carries out beautification adjustment on a target object through adjusting the position of a key point in each frame of video image by detecting the key point corresponding to the target object in the call video stream, the beautification adjustment comprises a part for adjusting the outline of the target object, after the terminal sends the beautified call video stream to a server, the server forwards the beautified call video stream to the terminal corresponding to the friend, and the terminal corresponding to the friend plays the beautified call video stream.

Optionally, the video-based object processing method provided in the embodiment of the present application may be applied to a terminal, and may also be applied to a scene where the terminal and a server interact with each other.

When the video-based object processing method is applied to a terminal, the terminal comprises a video processing module, and beautification processing is performed on a target object in a video stream through the video processing module, so that the beautified video stream can be obtained.

When the video-based object processing method is applied to a scenario where a terminal and a server interact with each other, fig. 3 is a schematic diagram of an implementation environment provided by an exemplary embodiment of the present application, as shown in fig. 3, the implementation environment includes a terminal 310, a server 320, and a communication network 330;

the terminal 310 is configured to collect a video stream through a camera, and send the collected video stream to the server 320 through the communication network 330; or, download the video stream, and send the downloaded video stream to the server 320 through the communication network 330; alternatively, a video stream pre-stored in the terminal 310 is transmitted to the server 320 through the communication network 330.

Optionally, at least one of the camera application, the live video application, and the instant messaging application is installed in the terminal 310.

Optionally, the server 320 includes a video processing module 321, and after the video processing module 321 identifies a key point of a target object in the video stream, the key point is adjusted according to a key point adjustment policy, so as to beautify the target object in the video stream. Optionally, the server 320 feeds back the beautified video stream to the terminal 310, and the terminal 310 plays the beautified video stream.

It should be noted that the terminal 310 may be implemented as a mobile terminal such as a mobile phone, a tablet, a smart watch, a laptop portable notebook computer, a music playing device, or a terminal such as a desktop computer and a monitoring device, which is not limited in the embodiment of the present application.

The server 320 may be implemented as one server, or as a group of servers constructed by multiple servers, or as a physical server, or as a cloud server, which is not limited in this embodiment of the present application.

With reference to the above noun introduction and application scenarios, a video-based object processing method provided in an embodiment of the present application is described, and fig. 4 is a flowchart of the video-based object processing method provided in an exemplary embodiment of the present application, which is described by taking the method as an example when applied to a terminal, as shown in fig. 4, the method includes:

step 401, obtaining a video stream of a video, where the video content of the video stream includes a target object.

Optionally, the target object may be at least one of a character, an animal, an animation character, and a cartoon character, and the target object may also be implemented as at least one of a building, a plant, an ornament, and a daily article, which is not limited in this embodiment.

Optionally, the terminal may directly obtain the stored video stream, may also shoot the video stream through the camera, and may also download the video stream from the server, where the obtaining manner of the video stream is not limited in the present application.

Optionally, the video content of the video stream may include a single target object, or may include a plurality of target objects. Optionally, the target objects identifiable in the video stream at a time are objects within a preset number, illustratively, the first video frame of the video stream includes 8 objects, and if the preset number is 5, 5 objects in the 8 objects are determined as the target objects.

Optionally, the manner of determining the target object includes any one of the following manners:

firstly, a target object is determined according to the size of the display area occupied by the object in a video frame

Illustratively, when the preset number is 5 and the number of objects in the video frame is greater than 5, the 5 objects with the largest display area are determined as the target objects.

Secondly, determining a target object according to the display proportion of the object in the video frame;

the target object is implemented as a human figure, and the display scale is used to represent the percentage of the area of the human figure displayed in the video frame to the whole area, such as: the preset number is 3, the display ratio of the person a in the video frame is 45% (that is, 45% of the person a is displayed in the video frame), the display ratio of the person B in the video frame is 47%, the display ratio of the person C in the video frame is 85%, the display ratio of the person D in the video frame is 90%, and the display ratio of the person E in the video frame is 100%, and then the person C, the person D, and the person E are determined as the target persons.

And thirdly, randomly selecting objects corresponding to a preset number from all the objects of the video frame as target objects.

Optionally, each frame in the video stream includes the target object, or a part of the video frames in the video stream includes the target object.

Step 402, performing key point tracking on a target object, wherein the target object corresponds to n key points, the n key points comprise m key points for constructing the outline of the target object, and m is greater than 0 and less than or equal to n.

Optionally, when performing key point tracking on a target object in a video stream, the key point tracking may be implemented by performing key point identification on the target object in each frame of video frame, or may also be implemented by performing key point identification on the target object in a key frame (I frame), which is schematically described with reference to the two manners:

firstly, carrying out key point identification on a target object in each frame of video frame to realize key point tracking;

optionally, each frame of video frame in the video stream is obtained, and a target object in each frame of video frame is identified, where a group of key point sets (including n key points) corresponding to the target object is identified and obtained for each frame of video frame, and there is a correspondence between n key points corresponding to each two frames of video frame, schematically, a key point set a is identified and obtained for video frame 1, and a key point set b is identified and obtained for video frame 2, where there is a correspondence between a key point in the key point set a and a key point in the key point set b, optionally, the correspondence is a one-to-one correspondence, that is, one key point in the key point set a corresponds to one key point in the key point set b. And sequentially acquiring each frame of video frame, and sequentially identifying the key points in each frame of video frame, thereby realizing the tracking of the key points.

Secondly, performing key point identification on a target object in a key frame (I frame) to realize key point tracking;

optionally, key frames in the video stream are obtained, and key point identification is performed on a target object in the key frames to obtain n key points corresponding to the target object in each frame of key frames, where n key points in different key frames have a corresponding relationship, and optionally, the corresponding relationship is a one-to-one corresponding relationship. Optionally, the key points in the non-key frames (P frame and B frame) corresponding to the I frame are determined according to the identified key points in the I frame. Optionally, the non-key frame is a video frame obtained according to the I frame and a change of a pixel point based on the I frame, and the position of the key point in the non-key frame is determined according to the key point identified in the I frame and a change (e.g., a position change) of the pixel point where the key point is located.

Optionally, the n key points include key points at the contour of the target object, and optionally, the n key points further include at least one of key points corresponding to five sense organs of the face of the target object and key points corresponding to detail features of the target object.

Wherein the key points at the contour of the target object comprise at least one of a leg contour key point of the target object, a hip contour key point of the target object, a waist contour key point of the target object, a chest contour key point of the target task, an arm contour key point of the target object, a neck contour key point of the target object and a head contour key point of the target object; the key points corresponding to the five sense organs of the face of the subject comprise at least one of eyebrow key points, nose key points, lip key points and ear key points; the key points corresponding to the detail features of the target object comprise at least one of finger key points, hairline key points and ankle key points.

Optionally, the n key points include m key points distributed on the contour of the target object, where the m key points are used to construct the contour of the target object, and optionally, the m key points are sequentially connected in a curve connection manner to construct the contour of the target object.

Referring to fig. 5, schematically, a target object 510 is included in an image frame 500, a key point 511 distributed at the contour of the target object 510 is detected from the image frame 500, the key point 511 is connected in sequence by a smooth curve, and a line contour 512 corresponding to the target object is obtained, where the line contour 521 is a contour curve determined by the key point 511. That is, the line profile 512 changes correspondingly with the change of the position of the key point 511.

In step 403, a keypoint adjust signal is received.

Optionally, the keypoint adjustment signal is used to indicate an adjustment to the contour of the target object. Optionally, the key point adjusting signal is further used for indicating that the face of the target object is adjusted, and/or the key point adjusting signal is further used for indicating that the detail feature of the target object is adjusted.

Optionally, the key point adjusting signal may be triggered according to a key point adjusting operation of a user, or may be automatically generated according to recording of a video stream by the user.

When the key point adjusting signal is triggered by the key point adjusting operation of the user, the user can directly drag the key point to adjust the key point, and can also select on the automatic beautifying control to trigger the automatic adjustment of the key point.

And step 404, adjusting the position of the first key point in the m key points according to the key point adjusting signal.

Optionally, the first keypoint may be implemented as one keypoint, a group of keypoints (e.g., a group of consecutive keypoints), or a plurality of groups of keypoints (e.g., a plurality of groups of consecutive keypoints).

Optionally, the position is used to represent a relative position between the first keypoint and other keypoints. Optionally, a relative position relationship exists between each key point and the adjacent key points, and the relative position relationship between the adjusted key point and the key point adjacent to the key point is adjusted according to the position adjustment of the first key point.

Optionally, the m key points have a left-right symmetric relationship on the object contour, that is, a point a on the left object contour corresponds to a point B on the right object contour, when the point a performs position adjustment according to the key point adjustment signal, the point B performs corresponding symmetric adjustment with the point a, for example, when the point a performs position adjustment to the left side, the point B performs position adjustment to the right side; when the position adjustment is performed upward at point a, the position adjustment is also performed upward at point B.

Optionally, when the key point adjusting signal is generated according to a dragging operation, adjusting the position of the first key point according to the dragging operation; when the key point adjustment operation is automatically generated or a user generates an automatic beautification control by triggering, the position adjustment of the key point can also be realized by a preset adjustment model, that is, m key points are input into the preset adjustment model according to a key point adjustment signal, and m adjusted key points are obtained by outputting through the preset adjustment model, wherein the m adjusted key points include a first key point with an adjusted position, and the preset adjustment model is obtained by training key points corresponding to a sample object.

And 405, processing the contour region corresponding to the first key point on the target object according to the adjusted position of the first key point.

Optionally, the first key point is a key point on the contour of the target object, and the contour of the target object is obtained by connecting the first key points located on the contour through a smooth curve, that is, when the position of the first key point changes, the contour obtained by connecting the first key points correspondingly changes, and then the position of the first key point is adjusted to drive the contour region corresponding to the first key point to perform corresponding adjustment.

In summary, in the video-based object processing method provided by this embodiment, the key points corresponding to the contour of the target object are obtained by performing key point identification on the target object in the video stream, and the first key points on the contour are adjusted by receiving the key point adjustment signal, so that the overall contour of the target object is adjusted, the problems of inconsistent figure proportion and low object beautification accuracy caused by lengthening the legs in the related art are avoided, the beautification accuracy for beautifying the target object is improved, and the efficiency for performing overall adjustment on the target object is improved.

In an alternative embodiment, the key point adjustment signal may be generated by receiving a dragging operation, and fig. 6 is a flowchart of a video-based object processing method according to another exemplary embodiment of the present application, which is exemplified by applying the method to a terminal, as shown in fig. 6, and the method includes:

step 601, acquiring a video stream of a video, wherein the video content of the video stream comprises a target object.

Optionally, the terminal may directly acquire the stored video stream, may also shoot the video stream through the camera, and may also download the video stream from the server, where the method for acquiring the video stream is not limited in the present application.

Optionally, the video content of the video stream may include a single target object or may include a plurality of target objects.

Optionally, the manner of acquiring the video stream is described in detail in step 401, and is not described herein again.

Step 602, performing key point tracking on a target object, where the target object has n corresponding key points, the n key points include m key points for constructing a contour of the target object, and m is greater than 0 and less than or equal to n.

Optionally, when performing the key point tracking on the target object in the video stream, the key point tracking may be performed by performing key point identification on the target object in each frame of the video frame, or performing key point identification on the target object in a key frame (I frame).

Step 603, receiving a dragging operation acting on the first key point.

Optionally, the start position of the dragging operation is located within a first preset range around the first key point. Optionally, the first preset range around the first key point refers to a circular range with the first key point as a center and a first distance as a radius. Optionally, a first preset range is corresponding to each key point. Optionally, there is no intersection between the first preset ranges, and when the start position of the received drag operation is located within the first preset range of the first keypoint, the drag operation is taken as the drag operation for the first keypoint. Optionally, when an intersection exists between two adjacent first preset ranges, and the start position of the dragging operation is located in the intersection range between the two adjacent first preset ranges, the dragging operation is taken as the dragging operation for two first key points corresponding to the intersection range.

Optionally, the first key point is a key point for directly performing position adjustment according to the dragging signal, and the second key point located in a second preset range around the first key point is a key point for indirectly performing position adjustment under the driving of the first key point.

Optionally, the end position of the drag operation is a relative position with respect to the start position, that is, the end position of the drag operation is represented by the start position and a displacement of the start position; or, the end position of the drag operation is represented by coordinates in the terminal interface.

It should be noted that the n key points may be visible or invisible in a playing interface of the video stream, which is not limited in this embodiment of the application.

And step 604, generating a key point adjusting signal according to the dragging operation.

Optionally, the key point adjustment signal includes a first key point corresponding to the dragging operation and an end position of the dragging operation, or the key point adjustment signal includes the first key point and the dragging displacement.

Alternatively, the dragging operation may be an operation performed when the video stream stops at a certain frame of video frame, or may be an operation performed during the playing of the video stream. Optionally, when the dragging operation is an operation executed during the playing of the video stream, the key point adjusting signal is generated directly according to the dragging displacement of the dragging operation on the terminal display screen.

Step 605, adjusting the first key point of the m key points to the end position according to the key point adjusting signal.

Optionally, the manner of adjusting the first key point includes any one of the following manners:

firstly, adjusting the first key point to a coordinate point where the end position is located according to the end position in the key point adjusting signal;

illustratively, the starting position of the dragging operation is (10, 7), the ending position is (10, 8), and then the first key point corresponding to the dragging operation is determined according to the starting position, and then the first key point is adjusted to the ending position (10, 8).

Secondly, the end position comprises displacement information of the end position based on the initial position, and the position of the first key point is moved by a displacement corresponding to the end position according to the displacement information;

illustratively, the starting position of the dragging operation is (10, 7), the ending position is (10, 8), and then after the first key point (10.5, 7) corresponding to the dragging operation is determined according to the starting position, the first key point is moved by the position corresponding to the ending position, so as to obtain the adjusted first key point (10.5, 8).

And 606, adjusting key points which are positioned in a second preset range around the first key point in the m key points and correspond to the first key point.

Optionally, since the contour of the target object is obtained by connecting m key point curves, when the distance between the key points is small, the dragging operation for a single first key point has a corresponding influence on a second key point around the first key point, and the influence degree is related to the position relationship between the second key point and the first key point.

Optionally, a direction in which the start position moves to the end position in the dragging operation is determined as a dragging direction, a distance between the start position and the end position is determined as a dragging distance, a second key point located in a second preset range around the first key point is determined, a position relation between the second key point and the first key point is determined, the dragging distance is zoomed in a zoom ratio corresponding to the position relation, a zoom distance is obtained, and the position of the second key point is adjusted according to the dragging direction and the zoom distance.

Optionally, the position relationship between the second key point and the first key point may include any one of the following relationships:

firstly, the position relationship is the distance relationship between the second key point and the first key point;

optionally, for a second key point located within a preset distance range around the first key point, determining a distance relationship between the second key point and the first key point, and scaling a scaling ratio corresponding to the dragging distance and the distance relationship. Optionally, the farther the distance is, the higher the scaling ratio is, and the smaller the size of the scaled distance obtained by scaling is; the closer the distance, the smaller the scaling ratio, and the larger the magnitude of the scaled distance obtained by scaling.

Second, the position relationship is a spacing relationship between the second key point and the first key point.

Optionally, for second key points which are located around the first key point and have a number of key points apart less than the preset number, the number of key points apart between the second key point and the first key point is determined, and the zoom ratio corresponding to the dragging distance and the number of key points apart is zoomed. Optionally, the more the number of the key points are spaced, the higher the scaling ratio is, and the smaller the scaling distance obtained by scaling is; the smaller the number of the separated key points is, the smaller the scaling proportion is, and the larger the scaling distance obtained by scaling is.

Referring to fig. 7, schematically, a target object 710 is included in an image frame 700, a keypoint 720 corresponding to the target object 710 is obtained after image recognition is performed on the image frame 700, the keypoint 720 is located around the contour of the target object 710, after a dragging operation is received on the keypoint 721, a position adjustment corresponding to the dragging operation is performed on the keypoint 721, and corresponding position adjustments are performed on the keypoint 722, the keypoint 723, the keypoint 724, and the keypoint 725 located around the keypoint 721, wherein 0 keypoints are spaced between the keypoint 722 and the keypoint 723, and 1 keypoint is spaced between the keypoint 724 and the keypoint 725 and the keypoint 721, so that a first displacement distance between the keypoint 722 and the keypoint 723 is greater than a second displacement distance between the keypoint 724 and the keypoint 725, and a first displacement distance between the keypoint 722 and the keypoint 723 is smaller than a displacement distance between the keypoint 721.

And 607, processing the contour region corresponding to the first key point on the target object according to the adjusted position of the first key point.

Optionally, the contour region corresponding to the first key point on the target object is subjected to contour adjustment according to the adjusted position of the first key point, and optionally, a contour obtained by connecting the adjusted first key point is used as the adjusted contour.

Referring to fig. 8, schematically, a target object 810 is included in an image frame 800, the target object 810 corresponds to a relevant key point 820, an adjusted key point 830 is obtained after a dragging operation is performed on the key point 820, and an adjusted contour 840 of the target object is obtained by combining a connection curve of the key point 830.

Optionally, the steps 605 to 607 are executed for a frame or a group of video frames played after the dragging operation is finished, and corresponding adjustment is performed for video frames after the frame or the group of video frames according to the position adjustment of the first key frame and the second key frame.

According to the method provided by the embodiment, the position of the key point on the contour of the target object is adjusted through the dragging operation, so that the contour of the target object is driven to be correspondingly adjusted, and the accuracy rate of adjusting the whole body curve of the target object is improved.

In an optional embodiment, the n key points further include k key points corresponding to the face of the target object, and the key point adjustment signal is further used to indicate that the face of the target object is adjusted, where 0 < k < n. Fig. 9 is a flowchart of a video-based object processing method according to another exemplary embodiment of the present application, which is described by taking an application of the method in a terminal as an example, and as shown in fig. 9, the method includes:

step 901, a video stream is obtained, and the video content of the video stream includes a target object.

Step 902, performing keypoint tracking on a target object in a video stream, wherein the target object corresponds to n keypoints, the n keypoints include k keypoints corresponding to the face of the target object, and k is greater than 0 and less than n.

Optionally, the n key points include k key points corresponding to five sense organs of the face of the target object. Optionally, the k key points include at least one of an eyebrow key point, a nose key point, a lip key point, and an ear key point.

Step 903, receiving a key point adjusting signal.

Optionally, the key point adjustment signal is used to indicate that the face of the target object is adjusted.

Optionally, the key point adjustment signal may be triggered according to a key point adjustment operation of a user, or may be automatically generated according to a recording of a video stream by the user.

And 904, adjusting the position of a third key point in the k key points according to the key point adjusting signal.

Optionally, the adjusting the position of the third keypoint of the k keypoints according to the keypoint adjustment signal includes at least one of the following ways:

firstly, receiving a dragging operation on a third key point in the k key points, and adjusting the position of the third key point according to the dragging operation;

please refer to the above steps 603 to 606 for the specific dragging process.

Secondly, receiving a trigger operation on the face adjusting control, inputting the k key points into a face adjusting model according to the trigger operation, and outputting the adjusted k key points through the face adjusting model.

And step 905, adjusting the facial features corresponding to the third key points according to the adjusted positions of the third key points.

Optionally, at least one of eyebrow enlargement, nose shape adjustment, lip shape adjustment, and ear shape adjustment is performed on the face of the target object by adjusting the position of a third key point of the k key points.

Optionally, the adjustment manner for the k key points may also be adjusted as follows:

1. dividing k key points according to the facial feature parts to obtain a first key point set corresponding to each facial feature part;

optionally, after the k keypoints are divided according to facial features, a first keypoint set 1 corresponding to eyebrows, a first keypoint set 2 corresponding to noses, a first keypoint set 3 corresponding to lips, and a first keypoint set 4 corresponding to ears are obtained.

2. Receiving a first adjustment signal for a target feature, the first adjustment signal indicating independent adjustment of the target feature;

alternatively, the first adjustment signal may be a signal generated according to a drag operation at the target feature after the target feature is selected, or may be a signal generated by an adjustment operation on an adjustment control for the target feature.

3. Performing position adjustment on a fourth key point in the first key point set corresponding to the target feature part according to the first adjustment signal;

4. and adjusting the target characteristic part according to the adjusted position of the fourth key point.

Referring to fig. 10, schematically, a portion selection control is included in an adjustment interface 1010 of a video stream, where the portion selection control includes a selection control 1011 corresponding to an eyebrow, a selection control 1012 corresponding to a nose, a selection control 1013 corresponding to a lip, and a selection control 1014 corresponding to an ear, and when a selection operation on the selection control 1011 corresponding to an eyebrow is received, an eyebrow portion 1020 of a target object is displayed in the adjustment interface 1010 of the video stream, and an eyebrow adjustment control 1030 is displayed, and parameters such as an eyebrow size, a distance between two eyes, and a pupil size are adjusted by adjusting the eyebrow adjustment control 1030; alternatively, the drag operation is directly performed on the eyebrow portion 1020, and the size of the eyebrow is adjusted according to the drag operation.

According to the method provided by the embodiment, the key points of the target object in the video stream are identified to obtain k key points corresponding to the face of the target object, and the adjustment operation of the k key points is performed, so that the facial features of the target object are adjusted, and the accuracy and precision of beautifying the target object are improved.

In an alternative embodiment, the n key points may be directly divided according to body parts, and corresponding adjustments are performed according to different body parts, fig. 11 is a flowchart of a video-based object processing method provided in another exemplary embodiment of the present application, which is exemplified or explained by applying the method to a terminal, as shown in fig. 11, the method includes:

in step 1101, a video stream is obtained, wherein the video content of the video stream comprises a target object.

Optionally, the video content of the video stream may include a single target object, or may include a plurality of target objects.

Step 1102, performing key point tracking on a target object in the video stream, where the target object corresponds to n key points, and n is a positive integer.

Optionally, when performing key point tracking on a target object in a video stream, the key point tracking may be performed by performing key point identification on the target object in each frame of video frame, or performing key point identification on the target object in a key frame (I frame).

Step 1103, dividing the n key points according to the body part to obtain a second key point set corresponding to each body part.

Optionally, the n key points are divided according to the body part to obtain at least one of a second key point set corresponding to the head (including key points corresponding to facial features), a second key point set corresponding to the neck, a second key point set corresponding to the chest, a second key point set corresponding to the arm, a second key point set corresponding to the waist, a second key point set corresponding to the hip, a second key point set corresponding to the leg, and the like.

At step 1104, a second adjustment signal for the target body part is received.

Optionally, the second adjustment signal is used to indicate an independent adjustment of the target body part.

Alternatively, the second adjustment signal may be a signal generated according to a dragging operation at the target body part after the target body part is selected, or may be a signal generated by an adjustment operation on an adjustment control for the target body part.

Step 1105, adjusting the position of a fifth keypoint in the second keypoint set corresponding to the target body part according to the second adjustment signal.

And step 1106, adjusting the target body part on the target object according to the position of the fifth key point after adjustment.

Illustratively, the adjustment interface of the video stream includes body part selection controls, including a selection control a corresponding to a head, a selection control b corresponding to a neck, a selection control c corresponding to a chest, a selection control d corresponding to a waist, a selection control e corresponding to an arm, and a selection control f corresponding to a leg, and illustratively, when a selection operation on the selection control f corresponding to a leg is received, a leg region of a target object is displayed in the adjustment interface of the video stream, leg adjustment controls are displayed, and parameters such as leg length and leg thickness degree are adjusted by adjusting the leg adjustment controls; or, the dragging operation is directly performed on the leg area, and the leg length and the leg thickness are adjusted according to the dragging operation.

In summary, in the object processing method based on video provided by this embodiment, the key points corresponding to the contour of the target object are obtained by performing key point identification on the target object in the video stream, and the first key points on the contour are adjusted by receiving the key point adjustment signal, so that the overall contour of the target object is adjusted, the problems of inconsistent stature and stature proportion and low object beautification accuracy caused by lengthening the legs in the related art are avoided, the beautification accuracy for beautifying the target object is improved, and the efficiency for performing overall adjustment on the target object is improved.

Fig. 12 is a flowchart of a video-based object processing method according to another exemplary embodiment of the present application, which is described by taking an object as a target person and applying the method to a terminal as an example, and includes:

step 1201, the client obtains the figure and the face information in the video stream through the terminal.

Optionally, after the client acquires the video stream acquired by the terminal through the camera, the client performs key point identification on the target person in the video stream, and determines the stature and the face features of the target person according to the identified key points.

Step 1202, beautify the figure and the face in the video stream through a video beautification strategy to obtain beautified video data.

Optionally, the video beautification policy includes: and performing at least one of dragging operation on the video so as to beautify the target person and triggering the automatic beautification control so as to beautify the target person.

Step 1203, uploading the beautified video data to a server, so that the audience terminal receives the beautified video data sent by the server.

Optionally, the client is taken as a live broadcast application program as an example for explanation, after beautifying the live broadcast video stream, the client uploads the beautified live broadcast video stream to the server, and the server sends the beautified live broadcast video stream to the audience terminal for playing.

And step 1204, receiving the video message sent by the server, and playing the beautified video data.

Fig. 13 is a block diagram of a structure of a video-based object processing apparatus according to an exemplary embodiment of the present application, where as shown in fig. 13, the apparatus includes: an acquisition module 1310, an identification module 1320, a receiving module 1330, and an adjustment module 1340;

an obtaining module 1310, configured to obtain a video stream of the video, where video content of the video stream includes a target object;

an identifying module 1320, configured to perform key point tracking on the target object, where the target object corresponds to n key points, the n key points include m key points used to construct an outline of the target object, and m is greater than 0 and is equal to or less than n;

a receiving module 1330, configured to receive a key point adjusting signal, where the key point adjusting signal is used to instruct to adjust the contour of the target object;

an adjusting module 1340, configured to adjust a position of a first keypoint of the m keypoints according to the keypoint adjusting signal, where the position is used to represent a relative position between the first keypoint and another keypoint;

the adjusting module 1340 is further configured to process a contour region corresponding to the first key point on the target object according to the adjusted position of the first key point.

In an optional embodiment, the receiving module 1330 is further configured to receive a dragging operation applied to the first key point, where a starting position of the dragging operation is located within a first preset range around the first key point; and generating the key point adjusting signal according to the dragging operation, wherein the key point adjusting signal comprises the first key point corresponding to the dragging operation and the end position of the dragging operation.

In an optional embodiment, the adjusting module 1340 is further configured to adjust a first keypoint of the m keypoints to the end position according to the keypoint adjustment signal; and adjusting key points which are positioned in a second preset range around the first key point in the m key points and correspond to the first key point.

In an alternative embodiment, as shown in fig. 14, the adjusting module 1340 includes:

a determining submodule 1341, configured to determine a direction in which the start position moves to the end position in the dragging operation as a dragging direction, and determine a distance between the start position and the end position as a dragging distance;

the determining submodule 1341 is further configured to determine, for a second keypoint located within a second preset range around the first keypoint, a position relationship between the second keypoint and the first keypoint;

the determining submodule 1341 is further configured to zoom the dragging distance according to a zooming ratio corresponding to the position relationship, so as to obtain a zooming distance;

an adjusting submodule 1342, configured to adjust the position of the second keypoint by the dragging direction and the zooming distance.

In an alternative embodiment, the adjusting module 1340 includes:

an input submodule 1343, configured to input the m key points into a preset adjustment model according to the key point adjustment signal;

an output submodule 1344, configured to obtain the m adjusted key points through the output of the preset adjustment model, where the preset adjustment model is obtained through training of the key points corresponding to the sample object.

In an optional embodiment, the n key points further include k key points corresponding to the face of the target object, and the key point adjustment signal is further used to indicate that the face of the target object is adjusted, where 0 < k < n;

the adjusting module 1340 is further configured to adjust the position of a third keypoint of the k keypoints according to the keypoint adjusting signal; and adjusting the facial features corresponding to the third key points according to the adjusted positions of the third key points.

In an optional embodiment, the apparatus further comprises:

a dividing module 1350, configured to divide the k keypoints according to facial feature parts to obtain a first keypoint set corresponding to each facial feature part;

the receiving module 1330 is further configured to receive a first adjustment signal for a target feature, where the first adjustment signal is used to indicate that the target feature is independently adjusted;

the adjusting module 1340 is further configured to perform position adjustment on a fourth keypoint in the first keypoint set corresponding to the target feature according to the first adjusting signal; and adjusting the target characteristic part according to the adjusted position of the fourth key point.

In an optional embodiment, the apparatus further comprises:

a dividing module 1350, configured to divide the n keypoints according to body parts to obtain a second keypoint set corresponding to each body part;

the receiving module 1330, further configured to receive a second adjustment signal for a target body part, where the second adjustment signal is used to indicate that the target body part is independently adjusted;

the adjusting module 1340 is further configured to perform a position adjustment on a fifth keypoint of the second set of keypoints corresponding to the target body part according to the second adjusting signal; and adjusting the target body part according to the position of the fifth key point after adjustment.

In an optional embodiment, the obtaining module 1310 is further configured to obtain a key frame in the video stream;

the identifying module 1320 is further configured to perform key point identification on the target object in the key frames to obtain n key points corresponding to the target object in each key frame, where the n key points in different key frames have a corresponding relationship.

In summary, the video-based object processing apparatus provided in this embodiment obtains the key points corresponding to the contour of the target object by performing key point identification on the target object in the video stream, and adjusts the first key points on the contour by receiving the key point adjustment signal, thereby adjusting the overall contour of the target object, avoiding the problems of inconsistent body and stature ratio and low object beautification accuracy caused by lengthening the legs in the related art, improving the beautification accuracy of the target object, and improving the efficiency of overall adjustment of the target object.

It should be noted that: the video-based object processing apparatus provided in the foregoing embodiment is only illustrated by the division of the functional modules, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the functions described above. In addition, the video-based object processing apparatus provided in the foregoing embodiment and the video-based object processing method embodiment belong to the same concept, and specific implementation processes thereof are described in detail in the method embodiment, and are not described herein again.

Fig. 15 shows a block diagram of a terminal 1500 according to an exemplary embodiment of the present invention. The terminal 1500 may be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion Picture Experts compression standard Audio Layer 3), an MP4 player (Moving Picture Experts Group Audio Layer IV, motion Picture Experts compression standard Audio Layer 4), a notebook computer, or a desktop computer. Terminal 1500 may also be referred to as user equipment, a portable terminal, a laptop terminal, a desktop terminal, or other names.

In general, terminal 1500 includes: a processor 1501 and a memory 1502.

Processor 1501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, or the like. The processor 1501 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). Processor 1501 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also referred to as a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 1501 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, processor 1501 may also include an AI (Artificial Intelligence) processor for processing computational operations related to machine learning.

The memory 1502 may include one or more computer-readable storage media, which may be non-transitory. The memory 1502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 1502 is used to store at least one instruction for execution by processor 1501 to implement the video-based object processing methods provided by method embodiments herein.

In some embodiments, the terminal 1500 may further include: a peripheral interface 1503 and at least one peripheral. The processor 1501, memory 1502, and peripheral interface 1503 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 1503 via buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 1504, touch screen display 1505, camera 1506, audio circuitry 1507, positioning assembly 1508, and power supply 1509.

The peripheral interface 1503 may be used to connect at least one peripheral related to I/O (Input/Output) to the processor 1501 and the memory 1502. In some embodiments, the processor 1501, memory 1502, and peripheral interface 1503 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 1501, the memory 1502, and the peripheral interface 1503 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The Radio Frequency circuit 1504 is used to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency circuitry 1504 communicates with communication networks and other communication devices via electromagnetic signals. The radio frequency circuit 1504 converts an electrical signal into an electromagnetic signal to transmit, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 1504 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 1504 can communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, the radio frequency circuit 1504 may also include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 1505 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 1505 is a touch display screen, the display screen 1505 also has the ability to capture touch signals on or over the surface of the display screen 1505. The touch signal may be input to the processor 1501 as a control signal for processing. In this case, the display screen 1505 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, display 1505 may be one, providing the front panel of terminal 1500; in other embodiments, display 1505 may be at least two, each disposed on a different surface of terminal 1500 or in a folded design; in still other embodiments, display 1505 may be a flexible display disposed on a curved surface or a folded surface of terminal 1500. Even more, the display 1505 may be configured in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 1505 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-Emitting Diode), etc.

Camera assembly 1506 is used to capture images or video. Optionally, the camera assembly 1506 includes a front camera and a rear camera. Generally, a front camera is disposed at a front panel of a terminal, and a rear camera is disposed at a rear surface of the terminal. In some embodiments, the number of the rear cameras is at least two, and each rear camera is any one of a main camera, a depth-of-field camera, a wide-angle camera and a telephoto camera, so that the main camera and the depth-of-field camera are fused to realize a background blurring function, the main camera and the wide-angle camera are fused to realize panoramic shooting and a VR (Virtual Reality) shooting function or other fusion shooting functions. In some embodiments, camera assembly 1506 may also include a flash. The flash lamp can be a monochrome temperature flash lamp or a bicolor temperature flash lamp. The double-color-temperature flash lamp is a combination of a warm-light flash lamp and a cold-light flash lamp, and can be used for light compensation at different color temperatures.

The audio circuitry 1507 may include a microphone and speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 1501 for processing or inputting the electric signals to the radio frequency circuit 1504 to realize voice communication. For stereo capture or noise reduction purposes, multiple microphones may be provided, each at a different location of the terminal 1500. The microphone may also be an array microphone or an omni-directional acquisition microphone. The speaker is used to convert electrical signals from the processor 1501 or the radio frequency circuit 1504 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, the audio circuitry 1507 may also include a headphone jack.

The positioning component 1508 is used for locating the current geographic position of the terminal 1500 for navigation or LBS (Location Based Service). The Positioning component 1508 may be a Positioning component based on the united states GPS (Global Positioning System), the chinese beidou System, or the russian galileo System.

Power supply 1509 is used to power the various components in terminal 1500. The power supply 1509 may be alternating current, direct current, disposable or rechargeable. When the power supply 1509 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, the terminal 1500 also includes one or more sensors 1510. The one or more sensors 1510 include, but are not limited to: acceleration sensor 1511, gyro sensor 1512, pressure sensor 1513, fingerprint sensor 1514, optical sensor 1515, and proximity sensor 1516.

The acceleration sensor 1511 may detect the magnitude of acceleration on three coordinate axes of the coordinate system established with the terminal 1500. For example, the acceleration sensor 1511 may be used to detect components of the gravitational acceleration in three coordinate axes. The processor 1501 may control the touch screen 1505 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 1511. The acceleration sensor 1511 may also be used for acquisition of motion data of a game or a user.

The gyroscope sensor 1512 can detect the body direction and the rotation angle of the terminal 1500, and the gyroscope sensor 1512 and the acceleration sensor 1511 cooperate to collect the 3D motion of the user on the terminal 1500. The processor 1501, based on the data collected by the gyroscope sensor 1512, may implement the following functions: motion sensing (such as changing the UI according to a user's tilting operation), image stabilization while shooting, game control, and inertial navigation.

Pressure sensor 1513 may be disposed on a side bezel of terminal 1500 and/or underneath touch display 1505. When the pressure sensor 1513 is disposed on the side frame of the terminal 1500, the holding signal of the user to the terminal 1500 may be detected, and the processor 1501 performs left-right hand recognition or shortcut operation according to the holding signal collected by the pressure sensor 1513. When the pressure sensor 1513 is disposed at a lower layer of the touch display 1505, the processor 1501 controls the operability control on the UI interface according to the pressure operation of the user on the touch display 1505. The operability control comprises at least one of a button control, a scroll bar control, an icon control and a menu control.

The fingerprint sensor 1514 is configured to capture a fingerprint of the user, and the processor 1501 identifies the user based on the fingerprint captured by the fingerprint sensor 1514, or the fingerprint sensor 1514 identifies the user based on the captured fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 1501 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying, and changing settings, etc. The fingerprint sensor 1514 may be disposed on the front, back, or side of the terminal 1500. When a physical key or vendor Logo is provided on the terminal 1500, the fingerprint sensor 1514 may be integrated with the physical key or vendor Logo.

The optical sensor 1515 is used to collect ambient light intensity. In one embodiment, processor 1501 may control the brightness of the display on touch screen 1505 based on the intensity of ambient light collected by optical sensor 1515. Specifically, when the ambient light intensity is high, the display brightness of the touch display screen 1505 is increased; when the ambient light intensity is low, the display brightness of the touch display screen 1505 is turned down. In another embodiment, the processor 1501 may also dynamically adjust the shooting parameters of the camera assembly 1506 according to the ambient light intensity collected by the optical sensor 1515.

A proximity sensor 1516, also known as a distance sensor, is typically disposed on the front panel of the terminal 1500. The proximity sensor 1516 is used to collect the distance between the user and the front surface of the terminal 1500. In one embodiment, when the proximity sensor 1516 detects that the distance between the user and the front surface of the terminal 1500 gradually decreases, the touch display 1505 is controlled by the processor 1501 to switch from the bright screen state to the mute screen state; when the proximity sensor 1516 detects that the distance between the user and the front surface of the terminal 1500 gradually becomes larger, the processor 1501 controls the touch display 1505 to switch from the breath screen state to the bright screen state.

Those skilled in the art will appreciate that the configuration shown in fig. 15 does not constitute a limitation of terminal 1500, and may include more or fewer components than shown, or some components may be combined, or a different arrangement of components may be employed.

The embodiment of the present application further provides a computer device, which includes a memory and a processor, where the memory stores at least one instruction, at least one program, a code set, or an instruction set, and the at least one instruction, the at least one program, the code set, or the instruction set is loaded by the processor and implements the video-based object processing method.

An embodiment of the present application further provides a computer-readable storage medium, in which at least one instruction, at least one program, a code set, or a set of instructions is stored, and the at least one instruction, the at least one program, the code set, or the set of instructions is loaded and executed by the processor to implement the above video-based object processing method.

The present application further provides a computer program product, which when run on a computer, causes the computer to execute the video-based object processing method provided by the above-mentioned method embodiments.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer-readable storage medium, which may be a computer-readable storage medium contained in a memory of the above embodiments; or it may be a separate computer-readable storage medium not incorporated in the terminal. The computer readable storage medium has stored therein at least one instruction, at least one program, code set, or set of instructions that is loaded and executed by the processor to implement the above-described video-based object processing method.

Optionally, the computer-readable storage medium may include: a Read Only Memory (ROM), a Random Access Memory (RAM), a Solid State Drive (SSD), or an optical disc. The Random Access Memory may include a resistive Random Access Memory (ReRAM) and a Dynamic Random Access Memory (DRAM). The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for video-based object processing, the method comprising:

performing keypoint tracking on the target object of the video stream, wherein the target object corresponds to n keypoints, and n is a positive integer, and the method includes: acquiring key frames in the video stream, and performing key point identification on the target object in the key frames to obtain n key points corresponding to the target object in each key frame, wherein different key points correspond to different key point identifiers, and the key points in different key frames have corresponding relations through the key point identifiers; determining the position of the key point in a non-key frame according to the key point identified from the key frame and the change condition of the pixel point of the key point in the key frame, so as to realize the tracking of the key point;

dividing the n key points according to body parts to obtain a second key point set corresponding to each body part;

receiving a second adjustment signal for a target body part, the second adjustment signal indicating an independent adjustment of a contour of the target body part;

performing position adjustment on a fifth key point in the second key point set corresponding to the target body part according to the second adjustment signal;

adjusting the target body part according to the position of the fifth key point after adjustment;

based on the tracking of the fifth key point in the key frame and the non-key frame, adjusting the contour region of the target body part on the target object in the key frame and the non-key frame in the video stream according to the adjusted position of the fifth key point.

2. The method of claim 1, further comprising:

receiving a dragging operation acted on a first key point in m key points, wherein the starting position of the dragging operation is located in a first preset range around the first key point, the m key points construct the outline of the target object in a curve connection mode in sequence, and m is more than 0 and less than or equal to n;

and generating a key point adjusting signal according to the dragging operation, wherein the key point adjusting signal comprises the first key point corresponding to the dragging operation and the end position of the dragging operation, and the key point adjusting signal is used for indicating the adjustment of the outline of the target object.

3. The method of claim 2, further comprising:

adjusting the first key point to the end position according to the key point adjusting signal;

and adjusting key points which are positioned in a second preset range around the first key point in the m key points and correspond to the first key point.

4. The method according to claim 3, wherein the adjusting of the keypoints located within a second preset range around the first keypoint comprises:

determining a direction in which the starting position moves to the ending position in the dragging operation as a dragging direction, and determining a distance between the starting position and the ending position as a dragging distance;

determining a position relation between a second key point and a first key point for the second key point located in a second preset range around the first key point;

zooming the dragging distance according to a zooming proportion corresponding to the position relation to obtain a zooming distance;

and adjusting the position of the second key point according to the dragging direction and the zooming distance.

5. The method of claim 1, further comprising:

receiving a keypoint adjustment signal, the keypoint adjustment signal being indicative of an adjustment to a contour of the target object;

inputting m key points into a preset adjustment model according to the key point adjustment signal, wherein the m key points sequentially construct the contour of the target object in a curve connection mode, and m is more than 0 and less than or equal to n;

and outputting the adjusted m key points through the preset adjustment model, wherein the preset adjustment model is obtained through training of the key points corresponding to the sample object.

6. The method of any one of claims 1 to 5, wherein the n keypoints further comprise k keypoints corresponding to the face of the target object, and the keypoint adjustment signal is further used for indicating an adjustment to the face of the target object, 0 < k < n;

the method further comprises the following steps:

adjusting the position of a third key point in the k key points;

and adjusting the facial features corresponding to the third key points according to the adjusted positions of the third key points.

7. The method of claim 6, further comprising:

dividing the k key points according to the facial feature parts to obtain a first key point set corresponding to each facial feature part;

receiving a first adjustment signal for a target feature, the first adjustment signal indicating an independent adjustment to the target feature;

adjusting the position of a fourth key point in the first key point set corresponding to the target feature part according to the first adjusting signal;

and adjusting the target characteristic part according to the adjusted position of the fourth key point.

8. An apparatus for video-based object processing, the apparatus comprising:

the acquisition module is used for acquiring the video stream of the video, and the video content of the video stream comprises a target object;

a recognition module, configured to perform keypoint tracking on the target object of the video stream, where the target object corresponds to n keypoints, n is a positive integer, and the recognition module includes: acquiring key frames in the video stream, and performing key point identification on the target object in the key frames to obtain n key points corresponding to the target object in each frame of key frames, wherein different key points correspond to different key point identifiers, and the key points in different key frames have corresponding relations through the key point identifiers; determining the position of the key point in a non-key frame according to the key point identified from the key frame and the change condition of the pixel point of the key point in the key frame, so as to realize the tracking of the key point;

the dividing module is used for dividing the n key points according to body parts to obtain a second key point set corresponding to each body part;

a receiving module for receiving a second adjustment signal for a target body part, the second adjustment signal being indicative of an independent adjustment of a contour of the target body part;

an adjusting module, configured to perform position adjustment on a fifth keypoint in the second keypoint set corresponding to the target body part according to the second adjusting signal;

the adjusting module is further configured to adjust the target body part according to the adjusted position of the fifth key point;

means for adjusting the contour region of the target body part on the target object in key frames and non-key frames in the video stream according to the adjusted position of the fifth keypoint based on the tracking of the fifth keypoint in the key frames and the non-key frames.

9. The apparatus according to claim 8, wherein the receiving module is further configured to receive a dragging operation applied to a first key point of m key points, a starting position of the dragging operation is located within a first preset range around the first key point, the m key points sequentially construct the contour of the target object in a curve connection manner, where 0 < m ≦ n; and generating a key point adjusting signal according to the dragging operation, wherein the key point adjusting signal comprises the first key point corresponding to the dragging operation and the end position of the dragging operation, and the key point adjusting signal is used for indicating the adjustment of the outline of the target object.

10. The apparatus of claim 9, wherein the adjusting module is further configured to adjust a first keypoint of the m keypoints to the end position according to the keypoint adjustment signal; and adjusting key points which are positioned in a second preset range around the first key point in the m key points and correspond to the first key point.

11. The apparatus of claim 10, wherein the adjustment module comprises:

the determining submodule is used for determining the direction of the starting position moving to the ending position in the dragging operation as a dragging direction, and the distance between the starting position and the ending position is determined as a dragging distance;

the determining submodule is further configured to determine a position relationship between a second key point located in a second preset range around the first key point and the first key point;

the determining submodule is further configured to scale the dragging distance according to a scaling ratio corresponding to the position relationship, so as to obtain a scaling distance;

and the adjusting submodule is used for adjusting the position of the second key point according to the dragging direction and the zooming distance.

12. A computer device comprising a processor and a memory, wherein at least one program is stored in the memory, and wherein the at least one program is loaded and executed by the processor to implement the video-based object processing method according to any one of claims 1 to 7.

13. A computer-readable storage medium, in which at least one program is stored, the at least one program being loaded and executed by a processor to implement the video-based object processing method according to any one of claims 1 to 7.