CN110049377B

CN110049377B - Expression package generation method and device, electronic equipment and computer readable storage medium

Info

Publication number: CN110049377B
Application number: CN201910185905.3A
Authority: CN
Inventors: 郭海芳; 李振宇; 吕晶; 孙凯歧
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2019-03-12
Filing date: 2019-03-12
Publication date: 2021-06-22
Anticipated expiration: 2039-03-12
Also published as: CN110049377A

Abstract

The invention provides an expression package generation method, an expression package generation device, electronic equipment and a computer readable storage medium, wherein the method comprises the following steps: judging whether the playing parameters of the target video meet a first preset condition or not; if the playing parameters of the target video meet the first preset condition, identifying a first image matched with a target bullet screen in the target video, wherein the target bullet screen is a bullet screen meeting a second preset condition; and generating an expression package according to the first image. The method and the device can automatically acquire the images for generating the expression package by using the bullet screen of the target video, thereby realizing the automatic generation of the expression package and simplifying the generation steps of the expression package.

Description

Expression package generation method and device, electronic equipment and computer readable storage medium

Technical Field

The invention relates to the technical field of multimedia, in particular to an expression package generation method and device, electronic equipment and a computer readable storage medium.

Background

Under the era of mobile internet, various social software is active in the visual field of people, and users gradually develop expression by using emoticons from beginning to using characters to gradually using simple symbols and expressions in the process of communication on a social platform.

At present, in the process of making an expression package, the currently popular stars, cartoons, movies and the like are mainly acquired as materials, and then a video editing tool is used for editing the materials to generate the expression package. In the process of generating the expression package, manual clipping and drawing are needed, and the operation is complicated.

Therefore, the expression package generation scheme in the related art generally has the problem of tedious manual operation.

Disclosure of Invention

The invention provides an expression package generation method and device, electronic equipment and a computer readable storage medium, and aims to solve the problem of complex manual operation of expression package generation schemes in the related art.

In order to solve the above problem, according to an aspect of the present invention, the present invention discloses an expression package generating method, including:

judging whether the playing parameters of the target video meet a first preset condition or not;

if the playing parameters of the target video meet the first preset condition, identifying a first image matched with a target bullet screen in the target video, wherein the target bullet screen is a bullet screen meeting a second preset condition;

and generating an expression package according to the first image.

According to another aspect of the present invention, the present invention also discloses an expression package generating device, including:

the judging module is used for judging whether the playing parameters of the target video meet a first preset condition or not;

the first identification module is used for identifying a first image matched with a target bullet screen in the target video if the playing parameter of the target video meets the first preset condition, wherein the target bullet screen is a bullet screen meeting a second preset condition;

and the first generation module is used for generating the expression package according to the first image.

According to another aspect of the present invention, the present invention also discloses an electronic device, comprising: the system comprises a memory, a processor and an expression package generation program which is stored on the memory and can run on the processor, wherein when the expression package generation program is executed by the processor, the steps of the expression package generation method are realized.

According to still another aspect of the present invention, the present invention also discloses a computer readable storage medium, on which an emoticon generation program is stored, which when executed by a processor implements the steps in the emoticon generation method according to any one of the above.

Compared with the prior art, the invention has the following advantages:

therefore, the embodiment of the invention identifies the first image matched with the target barrage from the target video by judging whether the playing parameter of the target video meets the first preset condition or not under the condition that the playing parameter meets the first preset condition, wherein the target barrage is the barrage meeting the second preset condition, and the first image is used for generating the expression package, so that the image used for generating the expression package can be automatically acquired by using the barrage of the target video, the automatic generation of the expression package is realized, and the generation steps of the expression package are simplified.

Drawings

FIG. 1 is a flowchart illustrating a first step of an embodiment of a method for generating an emoticon according to the present invention;

FIG. 2 is a flowchart illustrating a second step of an embodiment of a method for generating an emoticon according to the present invention;

FIG. 3 is a flowchart illustrating a third step of an embodiment of a method for generating an emoticon according to the present invention;

FIG. 4 is a flowchart illustrating a fourth step of an embodiment of a method for generating an emoticon according to the present invention;

FIG. 5 is a flowchart illustrating a fifth step of an embodiment of a method for generating an emoticon according to the present invention;

FIG. 6 is a flowchart illustrating a sixth step of an embodiment of a method for generating an emoticon according to the present invention;

FIG. 7 is a flow chart diagram seven illustrating the steps of an embodiment of the method for generating an emoticon according to the present invention;

FIG. 8 is a flowchart eight illustrating the steps of an embodiment of a method for generating an emoticon according to the present invention;

FIG. 9 is a flowchart illustrating the steps of an embodiment of a method for generating an emoticon according to the present invention;

FIG. 10 is a flowchart ten illustrating the steps of an embodiment of a method for generating an emoticon according to the present invention;

FIG. 11 is a schematic view of an embodiment of an emoticon;

FIG. 12 is a schematic view of another embodiment of an emoticon;

fig. 13 is a block diagram of an embodiment of an expression package generation apparatus according to the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Referring to fig. 1, a flowchart illustrating steps of an embodiment of an expression package generation method according to the present invention is shown, which may specifically include the following steps:

step 101, judging whether the playing parameters of a target video meet a first preset condition or not;

after receiving an emotion package generation request for a target video, the method of the embodiment of the present invention may execute step 101. The target video may be a complete video or a video segment, which is not limited in the present invention.

The playing parameters may include, but are not limited to, a time length for getting on the line, and a number of barrage.

Optionally, when the playing parameter includes an online duration, in step 101, it may be determined whether the online duration of the target video is greater than or equal to a preset duration threshold; if the online time of the target video is greater than or equal to a preset time threshold, the playing parameter of the target video meets a first preset condition; and if the online time of the target video is less than a preset time threshold, the playing parameter of the target video does not meet a first preset condition.

The preset time threshold may be empirically determined, for example, 24 hours. According to experience, after each video is counted for 24 hours on line, the number of the barrage of the video is large and stable, and therefore the first image used for generating the expression package can be obtained by the barrage.

Optionally, when the playing parameter includes the number of barrage, in step 101, it may be determined whether the number of barrage of the target video is greater than or equal to a preset number threshold and whether the stability of the number of barrage meets a preset stability condition; if the number of the barrage of the target video is greater than or equal to a preset number threshold and the stability of the number of the barrage meets a preset stability condition, the playing parameter of the target video meets a first preset condition; if the number of the barrage of the target video is smaller than a preset number threshold value, or the stability of the number of the barrage does not meet a preset stability condition, the playing parameter of the target video does not meet a first preset condition.

The preset number threshold may be 100, and the stability may represent a fluctuation interval of the bullet screen number in the latest period of time, for example, a fluctuation range is in a preset number interval, for example, 100 to 200, which indicates that the stability meets a preset stability condition.

After the target video is published for a period of time, the number of barrage of the target video is large, and the number of barrage is stable, for example, is kept within a range of 100 to 200, so that the generated expression package is more popular with users, the barrage can be used to acquire the first image for generating the expression package.

If the playing parameters of the target video meet the first preset condition, step 102, identifying a first image matched with a target bullet screen in the target video, wherein the target bullet screen is a bullet screen meeting a second preset condition;

wherein, the barrage of the video can represent which frame image or which segment in the video is popular with the user. And images or segments popular with the user are typically available as material for generating the emoticon. Therefore, the method of the embodiment of the invention can screen the target bullet screen meeting the second preset condition from the bullet screens of the target video, and identify the first image matched with the target bullet screen from the target video. The first image may be a human face image, an animal face image, a landscape image, or any other image matched with the target bullet screen.

Optionally, in an embodiment, the target bullet screen in the embodiment of the present invention may be a bullet screen with a higher heat value (i.e., more popular with the user) in multiple bullet screens of the target video, and since there may be multiple bullet screens at the same time point, there may be a case where two target bullet screens match the same first image.

Optionally, in another embodiment, the target bullet screen of the embodiment of the present invention may also be a bullet screen at a time point when the bullet screen heat rate is higher in the target video, for example, the target video with a duration of 2 hours, and the bullet screen at 1 hour and 11 minutes when the playing time is 1 hour and 11 minutes is higher in heat rate is the target bullet screen at 1 hour and 11 minutes.

And 103, generating an expression package according to the first image.

Optionally, in another embodiment, referring to fig. 2, the method according to an embodiment of the present invention may further include:

if the playing parameter of the target video meets the first preset condition, step 104, identifying a second image of which the expression belongs to a preset expression type in the target video;

among them, the method of the embodiment of the invention sets various expression types in advance, such as anger, disgust, fear, happiness, dull, sadness, surprise and the like.

In a case that it is determined that the playing parameter of the target video meets the first preset condition, the method according to the embodiment of the present invention may perform expression type recognition on frame-by-frame images in the target video, recognize at least one frame of original frame image whose expression belongs to a preset expression type, and then extract a face region from the at least one frame of original frame image to obtain a second image, where the second image may be an image of a face region or an image of a moving picture face region, and the present invention does not limit this.

Since the expression package generally refers to expressions possessed by a common face, in general, the expressions represented by the expression package are exaggerated, for example, the surprise expressions shown in fig. 11 and 12. Therefore, the method of the embodiment of the present invention sets, for example, a plurality of types representing exaggerated expressions in advance, and aims to extract partial images of the face having these exaggerated expression types from the target video and generate an expression package.

Alternatively, when step 104 is executed, it may be implemented by the method shown in fig. 3:

s201, acquiring each frame of image in the target video;

wherein each frame of image in the target video can be captured and saved using a computer vision library, such as an open source computer vision library (opencv).

S202, carrying out face region detection on each frame of image, and generating a local image containing a face region from each frame of image with the detected face region;

in one example, the face region may be detected for each frame of image, so as to detect a multi-frame image including the face region in the target video, and then the face region in the multi-frame image is deducted to generate a plurality of partial images including the face region.

Optionally, the method of the embodiment of the present invention may further add boundary regions to the plurality of face regions obtained by deducting, so that the generated local image includes not only the face region, but also additional image boundaries may be added around the face region, thereby improving the display effect of the expression bag.

Optionally, before S203, in order to improve accuracy of facial feature extraction, the method according to the embodiment of the present invention may further perform respective preprocessing on each local image. Because the original frame image in the video is limited by various conditions and randomly interfered, the error rate of the extracted facial features is increased, and therefore, the method provided by the embodiment of the invention can be used for carrying out preprocessing processes such as gray level correction and noise filtering on the local image. The specific processes of preprocessing may include, but are not limited to, face correction, ray compensation, gray scale transformation, histogram equalization, normalization, geometric correction, median filtering, face segmentation, and sharpening.

S203, extracting facial features of each local image to obtain target facial feature information;

for example, each partial image is an image including a face region, and a face feature extraction algorithm may be used to extract a face feature from each partial image, so that each partial image corresponds to target face feature information.

Facial features may include, but are not limited to, facial features, hair features, and the like.

S204, respectively inputting the feature information of each target face into a pre-trained expression classifier;

taking a human face as an example, the embodiment of the present invention is trained with an expression classifier in advance, where the expression classifier is used to identify, for input target face feature information (corresponding to a local image), probabilities that local images corresponding to the target face feature information belong to different preset exaggerated expression types.

For example, the expression classifier may classify the 7 preset exaggerated expression types (anger, disgust, fear, happiness, stiffness, sadness, surprise), and then after inputting the target facial feature information of a partial image into the expression classifier, the expression classifier may output 7 probability values respectively corresponding to the 7 preset exaggerated expression types.

S205, acquiring the probability that each local image output by the expression classifier belongs to different preset expression types;

as described in the foregoing example, the method according to the embodiment of the present invention may obtain 7 probability values of any one local image from the output result of the expression classifier, where the 7 probability values respectively represent probabilities that expression types of a face region in the local image respectively belong to the 7 preset exaggerated expression types.

And S206, identifying the local image with the probability greater than the preset probability threshold value as a second image.

Because the same expression image can belong to a plurality of preset exaggerated expression types at the same time, for example, the expression of a certain face belongs to dull or sadness. Therefore, for 7 output probabilities corresponding to one local image, if there is a probability that the probability is greater than the preset probability threshold in the 7 probabilities, it is indicated that the local image hits the preset expression type in the embodiment of the present invention, and the local image may be identified as the second image; on the contrary, if all the probabilities in the 7 probabilities are less than or equal to the probability of the preset probability threshold, it is indicated that the local image does not hit the preset expression type of the embodiment of the present invention, and the local image is not processed. In this way, the purpose of screening a plurality of partial images including a face region in the target video can be achieved, at least one partial image with a facial expression conforming to at least one of the 7 preset exaggerated expression types is obtained through screening, and the at least one partial image obtained through screening is the second image in the embodiment of the present invention.

Therefore, the method provided by the embodiment of the invention utilizes the pre-trained expression classifier to recognize and classify the expressions of the local images containing the facial area in the target video, so that at least one local image belonging to the preset expression type, namely the first face image can be screened out from the plurality of local images, and the expression package is generated based on the second image.

Alternatively, in one embodiment, when step 102 is executed, it may be implemented by S301 and S302 shown in fig. 4:

s301, identifying first time point information matched with a target bullet screen in the target video;

each bullet screen in the target video corresponds to one time point in the video, and one time point can have a plurality of bullet screens. Therefore, the time point information, namely the first time point information, matched with the target barrage meeting the first preset condition in the target video can be acquired.

For example, when a user watches a target video, a target barrage is sent out at 1 minute 50 seconds of the playing time of the target video, and the first time point information matched by the target barrage in the target video is 1 minute 50 seconds here.

Optionally, S302, a first image in the target video that matches the first time point information is identified.

The target video is composed of a plurality of continuous frames of images, each frame of image corresponds to a time point in the playing time of the target video, and therefore, the original frame image corresponding to the first time point in the target video can be identified. Alternatively, the face image in the original frame image may also be acquired. The first image may be an original frame image or the face image. In the above example, if the first time point information is 1 minute 50 seconds, the first image matched with the first time point information is the face image in the frame of image played in 1 minute 50 seconds in the target video, or the frame of image.

In this way, the method and the device for locating the first image in the target video can locate the first image in the target video described by the target bullet screen according to the first time point information by identifying the first time point information of the target bullet screen meeting the first preset condition in the plurality of bullet screens of the target video, so that the image which is interested by the user in the target video is obtained.

Alternatively, when S301 is executed, it may be implemented by the method shown in fig. 5:

s401, acquiring bullet screen heat corresponding to each time point in the target video;

each playing time point in the target video may correspond to zero or more barrages, and therefore, the barrage heat of each playing time point in the target video may be obtained. The unit of the playing time point can be flexibly configured according to the requirement, for example, the minimum unit of the playing time point is 1 second, or 0.1 second. Taking the minimum unit of seconds as an example, here, the bullet screen heat per second in the target video is obtained.

Because the video content played in each second in the target video is different, the barrage launched by the user in each second is also different, and therefore, the heat of the barrage in each second is also different.

Optionally, when S401 is executed, the number M of barrages corresponding to each time point in the target video may be obtained, where M is a positive integer; then, acquiring the bullet screen heat degree of each of the M bullet screens; and finally, acquiring the bullet screen heat corresponding to each time point according to the bullet screen heat of each bullet screen and the bullet screen quantity M.

For example, the heat of the bullet screen per second is positively correlated with the number of the bullet screens in the second, so that the number M of the bullet screens in the target video per second can be obtained in order to obtain the heat of the bullet screen in the target video per second, then the heat of the bullet screen of each of the M bullet screens is obtained, and finally, the heat of the bullet screens of the M bullet screens in the second can be summed to obtain the heat of the bullet screen per second.

Thus, when acquiring the heat degree of the bullet screen corresponding to each time point in the target video, the number of the bullet screens in a time point can reflect the heat degree of the bullet screen at the time point, so that the heat degree of each bullet screen of the number of the bullet screens in the time point can be acquired, and the heat degree of the bullet screen at a time point can be calculated. Because the expression package to be generated is an image with exaggerated expression, the embodiment of the invention analyzes through the barrage, so that the barrage heat of each time point is obtained, and the barrage heat can just reflect the reaction of the user on the expression and the action of the character in the video, so that the generated expression package can be the expression package which the user is interested in.

Optionally, when the step of obtaining the bullet screen heat of each of the M bullet screens is executed, the user evaluation parameters and the bullet screen contents of each of the M bullet screens may be obtained; and then, acquiring the bullet screen heat of each bullet screen in the M bullet screens according to the user evaluation parameters and the bullet screen contents.

Specifically, because the number of the barrage can only reflect the barrage heat of one time point in a single plane, or the expression of the character in the video by the user, in the embodiment of the present invention, the heat can be respectively obtained for each barrage in one time point, and when the heat of a single barrage is obtained, the heat of a single barrage can be obtained from two aspects of the user evaluation parameter and the content of the barrage.

The user evaluation parameters can include, but are not limited to, a praise amount and a report amount; the correlation degree (which can be embodied as bullet screen score) between the bullet screen content and the target video can be positively correlated with the heat of a single bullet screen.

Therefore, the popularity of a single bullet screen can be defined from the above dimensions, wherein the praise amount and the bullet screen score are positively correlated with the popularity of the single bullet screen, and the report amount is negatively correlated with the popularity of the single bullet screen.

In one example, the weight of each of the volume of like points, the volume of reports, and the score of the bullet screen may be preset, and then the three parameters are weighted and summed to obtain the popularity of a single bullet screen.

For the barrage scoring, the barrage content may be input to a scoring network trained in advance to obtain a barrage scoring of the barrage content, where the higher the relevancy of the barrage content to the content of the target video (e.g., the name of a main actor, the name of a character, a line of action in a drama, etc.), the higher the scoring is, the lower the relevancy is, and the lower the scoring is.

Thus, in the embodiment of the invention, when the heat of the bullet screens at each time point in the target video is obtained, not only the number of the bullet screens at the time point but also the heat of each bullet screen at the time point are considered, and when the heat of each bullet screen is obtained, the user evaluation parameters and the bullet screen contents of each bullet screen are combined, so that the obtained heat of the bullet screens at each time point can directly reflect the user's likeability to the character in the viewed target video, and the generated expression package is more favored by the user.

S402, identifying first time point information corresponding to the bullet screen heat degrees of a first preset number according to the sequence of the bullet screen heat degrees from high to low;

in this case, in this step, the bullet screen heat degrees at each time point in the target video may be sorted from high to low according to the bullet screen heat degrees, and a first preset number of bullet screen heat degrees arranged in front, for example, the first 5 bullet screen heat degrees arranged in front are identified, and each bullet screen heat degree corresponds to one piece of first time point information, and then the first time point information (i.e., play time) corresponding to the first 5 bullet screen heat degrees in the target video may be identified.

Or S403, identifying first time point information corresponding to the bullet screen heat degree of which the bullet screen heat degree is greater than the preset heat degree threshold value according to the sequence of the bullet screen heat degrees from high to low.

In step S401, the bullet screen heat at each time point in the target video may be counted, so that in this step, at least one bullet screen heat at which the bullet screen heat is greater than the preset heat threshold may be identified, and then, the first time point information (i.e., the playing time) corresponding to the at least one bullet screen heat in the target video is identified.

When first time point information corresponding to a target bullet screen meeting a first preset condition is acquired, schemes S401 and S402, or schemes S401 and S403 may be adopted.

Optionally, in the method according to the embodiment of the present invention, the first time point information obtained through S402 or S403 may be saved, so that the first time point information may be used later.

In this way, in the embodiment of the present invention, the bullet screen heat degree of each time point in the target video is obtained, and the first time point information corresponding to the bullet screen heat degree of which the bullet screen heat degree is greater than the preset bullet screen heat degree threshold is obtained, or the first time point information corresponding to the first bullet screen heat degrees higher in the multiple bullet screen heat degrees is obtained, so as to obtain the first image corresponding to the first time point information in the target video. Then, through the analysis of the bullet screen heat degree in the target video, a first image which is interesting and loved by the user in the process of watching the target video and corresponds to a first time point can be obtained, so that the generated expression package meets the requirements of most users.

Step 105, acquiring a same target image between the first image and the second image;

when the specific implementation of step 102 includes S302, then when executing this step 105, the target image, i.e. the image intersection, may also be obtained by directly taking the image intersection between the first image and the second image.

Optionally, in another embodiment, as shown in fig. 6, when the method of the embodiment includes S301, before performing step 105, the method of the embodiment of the present invention may further include S303, and when step 105 is performed, the method may be implemented by S304 and S305:

s303, acquiring second time point information matched with the second image in the target video;

the second image identified from the target video in step 104 is a partial facial image with an expression belonging to a preset exaggerated expression type, and the target video also corresponds to unique playing time information, that is, the second time point information here, where the second time point information corresponding to each second image can be obtained.

S304, obtaining the same target time point information between the first time point information and the second time point information;

because the facial expression in the second image corresponding to the second time point information belongs to the preset expression type, and the first image corresponding to the first time point information corresponds to a higher pop-up heat degree, the intersection of the two time point information, namely the same target time point information, and the corresponding first image or second image has the exaggerated facial expression and is loved by the user. Therefore, the target time point information can be acquired here.

S305, screening out a target image matched with the target time point information from the first image or the second image.

The target image (which may be a partial image of a human face) may be selected from the first image and may also be selected from the second image.

In this way, the embodiment of the present invention identifies a first image matching a target bullet screen meeting a first preset condition in a target video and a second image having a preset expression type, acquires the same target time point information based on first time point information and second time point information respectively corresponding to the two images in the target video, and acquires a target image (selected from the first image or the second image) matching the target time point information to generate an expression package. According to the embodiment of the invention, the target image which is not only exaggeratedly expressed but also loved by the user is obtained from the target video based on the time point information to automatically generate the expression package, so that the automatic generation of the expression package is realized, the generation steps of the expression package are simplified, the automatically generated expression package can be in line with the interest of the user, and the problem that the user searches for the materials of the expression package is avoided. In addition, because the number of the first images acquired based on the barrage is large, when the first images are used for generating the expression package, a long time is needed, and the generation efficiency of the expression package is reduced.

As shown in fig. 2 and fig. 6, when step 103 is executed, an expression package is generated according to the target image.

The execution sequence between step 102 and step 104 is not limited by the present invention.

For the principle of the specific implementation method for generating the expression package according to the target image, reference may be made to S501 to S505 (described here in the description of generating the expression package with the first image) shown in fig. 7, which will not be described again here. When adding subtitles to the generated expression package, the subtitles may be added to the expression package generated using the target image by referring to the principle of the method shown in fig. 8 and 9 described below, and details thereof will not be described here.

By means of the technical scheme of the embodiment of the invention, the embodiment of the invention identifies the first image matched with the target bullet screen meeting the first preset condition and the second image with the preset expression type in the target video, and acquires the same target image in the two images, so that the target image is utilized to generate the expression package. The method of the embodiment of the invention automatically identifies the target image suitable for being used as the expression package and generates the expression package without user operation in the generation process of the expression package, thereby simplifying the manual operation steps; the target image has an exaggerated expression and corresponds to a target barrage meeting a first preset condition, and the barrage can reflect the watching experience of the user on each frame of image in the video, so that the generated expression package also meets the production requirement of the expression package of the user. In addition, in the process of generating the expression package, a user does not need to search materials of the expression package, and the generation efficiency of the expression package can be improved.

Alternatively, when step 103 is executed, it may be implemented by the method shown in fig. 7:

s501, acquiring the frame number of the first image in the target video;

wherein the number of the first images may be one or more. In most scenes, the number of first images is plural, and each first image has a corresponding number of frames Q in the target video, i.e., corresponds to the qth frame image in the target video.

S502, according to the frame number, a plurality of first images with continuous frame numbers in the first images and/or a single first image with isolated frame numbers are identified;

the first image identified in step 102 may be a continuous image of several frames, or may be a discontinuous image of several frames.

For example, the plurality of original frame images corresponding to the plurality of first images are respectively the 10 th frame, the 11 th frame, the 12 th frame, the 20 th frame and the 50 th frame in the target video.

Then this step can identify three first images with consecutive frame numbers according to the above frame numbers, namely three first images corresponding to the 10 th frame, the 11 th frame and the 12 th frame, respectively, and a single first image corresponding to the 20 th frame and a single first image corresponding to the 50 th frame. Here, the description will be given taking as an example that the plurality of first images include both a plurality of frames of first images whose number of frames is continuous and a single first image whose number of frames is isolated. However, it should be noted that the method of the present invention is not limited to the above examples.

S503, generating a dynamic picture by using the plurality of continuous first images of the frame number as the expression package;

for example, for the above three first images whose number of frames is continuous, they may be generated into a moving picture as the emoticon.

And/or the presence of a gas in the gas,

s504, generating a static picture by using the single first image with the isolated frame number as the expression package;

for example, a single first image corresponding to the 20 th frame and a single first image corresponding to the 50 th frame generate two still pictures, respectively.

And/or the presence of a gas in the gas,

and S505, generating a video clip by respectively matching a plurality of target original frame images with the plurality of continuous first images of the frame number, wherein the target original frame images are used as the expression package.

For example, three first images of consecutive frames as described above may acquire three original frame images respectively matching them, i.e., the 10 th frame image, the 11 th frame image, and the 12 th frame image in the target video, and generate video clips from the three frame images.

In this embodiment of the present invention, the execution sequence of S503, S504, and S505 is not limited.

In this way, according to the difference of the number of the acquired first images for making the expression package, the embodiment of the invention can generate the dynamic expression package, namely the dynamic picture, for a plurality of continuous first images with the number of frames; in addition, a plurality of original frame images corresponding to the plurality of first images can be utilized to generate a video clip; in addition, for a single first image with an isolated frame number, namely discontinuity, a static expression package, namely a static picture, can be generated, so that the variety of the expression package generated by the embodiment of the invention is richer, and more choices are provided for a user.

Optionally, in an embodiment, as shown in fig. 11 and 12, a subtitle may be further added to the emoticon, and therefore, in the embodiment of the present invention, after step 103, as shown in fig. 8, the method according to the embodiment of the present invention may further include:

s601, acquiring a second preset number of original frame images which are arranged in the target video and are before the frame number corresponding to the first image in the expression package and are adjacent to the first image;

the following description is made by taking an example in which the first images identified in step 102 are plural, and the plural original frame images corresponding to the plural first images are respectively the 10 th frame, the 11 th frame, the 12 th frame, the 20 th frame and the 50 th frame in the target video.

As a result of step 103, a dynamic picture has been generated from the consecutive three frames of first images, and/or a video clip is generated from the three frames of original frame images corresponding to the consecutive three frames of first images; and/or two still pictures are respectively generated by the two first images of the 20 th frame and the 50 th frame.

When adding subtitles to each expression package generated in step 103, a recurrent neural network algorithm may be used to predict a scene according to the lines of the first few minutes of the playing time corresponding to each expression package in the target video, so as to add subtitles to each expression package according to the plot.

In this step, taking the example that the emoticon includes the above-mentioned moving picture, a second preset number (for example, 5 frames) of original frame images arranged before the 10 th to 12 th frames and adjacent to the first image of the three frames can be obtained in the target video, that is, original frame images of the 5 th to 9 th frames in the target video are obtained.

When the expression package is a static picture or a video clip, the execution principle of this step is similar to that when the expression package is a dynamic picture, and is not described in detail here.

S602, acquiring subtitle data matched with the original frame image;

because the subtitle data of the target video corresponds to each original frame image in the target video, the subtitle data corresponding to the original frame images of the 5 th to 9 th frames can be acquired.

S603, identifying a conversation scene according to the subtitle data;

the method of the embodiment of the invention can input the caption data into the scene recognition network trained in advance, thereby recognizing the conversation scene according to the input caption data.

For example, the subtitle data is "how annoying you are, i.e., i want to be" angry ", and the identified conversation scene is" angry ".

S604, generating text data according to the conversation scene;

the method of the embodiment of the invention can input the conversation scene into the expression package subtitle generating network trained in advance to obtain the text data added into the expression package. For example, "Haoshengyan! ".

S605, adding the text data into the expression package.

Fig. 11 and 12 show two emoticons to which text data is added, and the present invention is not limited to the adding position of the text data in the emoticons.

Therefore, the embodiment of the invention predicts the conversation scene by acquiring the first image corresponding to the generated expression packet and the caption data of the corresponding first few frames of original frame images in the target video, and generates the text data added to the expression packet, so that the added text data conforms to the conversation scene, and the semantic matching degree of the text data and the expression packet is increased.

Optionally, in an embodiment, after step 103, when adding subtitles to the generated emoticon, subtitles may also be added to the emoticon according to a bullet screen, as shown in fig. 9, the method according to the embodiment of the present invention may further include:

s701, acquiring the first image corresponding to the expression package, and obtaining target bullet screen content matched in the target video;

specifically, the first image corresponding to the emoticon may be obtained, and the matched time point information in the target video (i.e. the playing time in the video) may be obtained; and then, acquiring the target bullet screen content matched with the time point information.

For example, the generated dynamic picture is generated by three first images corresponding to the 10 th to 12 th frame original frame images, so that the playing time information of the 10 th to 12 th frame original frame images (or the three first images) in the target video can be acquired, and then three sets of target bullet screen contents respectively matched with the three playing time information are acquired.

Each set of target barrage content may include one or more barrages.

S702, generating text data according to the target bullet screen content;

when the number of the target barrage contents is one, the target barrage contents can be directly generated into text data for the expression package, or the semantics of the target barrage contents are acquired and generated into the text data.

When the target bullet screen content is multiple, in order to avoid excessive subtitle content added to the emoticon, the multiple target bullet screen content may be screened to obtain a high-quality bullet screen, that is, a bullet screen with the highest bullet screen score (wherein specific implementation of bullet screen score may refer to the specific description of the above embodiment, and is not described herein again). And then, generating text data for the emotion packets according to the target bullet screen content with the highest bullet screen score, or acquiring the semantics of the target bullet screen content with the highest bullet screen score, and generating the semantics as the text data.

S703, adding the text data into the expression package.

In this way, the embodiment of the invention obtains the first image corresponding to the generated expression package, generates the text data for the expression package in the target bullet screen content corresponding to the target video by using the target bullet screen content, and adds the text data as the subtitle to the expression package, so that the semantic matching between the subtitle of the expression package and the bullet screen of the original frame image corresponding to the expression package by the user is realized, and the bullet screen added to the expression package meets the production requirement of the user on the expression package.

Optionally, in another embodiment, as shown in fig. 10, a flowchart illustrating steps of another method for generating an expression package according to the present invention is shown, and specifically includes:

after the step 101, if the playing parameter of the target video does not meet the first preset condition, executing a step 106, and identifying a third image of which the expression belongs to a preset expression type in the target video;

the execution principle of step 106 is similar to that of step 104, and is not described herein again, where the second image and the third image may be the same.

And step 107, generating an expression package according to the third image.

Specifically, at the initial stage of releasing the target video, the number of barrage of the target video is small, and the number of barrage is not stable enough, so if the image for generating the expression package is acquired by referring to the barrage, the generated expression package does not meet the favorite requirement of the expression package of the user, and therefore, in this embodiment, the expression package can be generated by directly using the third image of which the facial expression belongs to the preset expression type.

The principle of a specific implementation method for generating the expression package according to the third image may refer to S501 to S505 shown in fig. 7 (here, the description is made by using the first image to generate the expression package), and details are not repeated here.

In this way, in the embodiment of the present invention, in the initial stage of online of the target video, under the condition that the number of the bullet screens is not large and not stable enough, the expression package is generated by identifying the third image of which the expression belongs to the preset expression type. Therefore, the expression package can be generated by adopting different strategies according to different conditions of the playing parameters of the target video, and the manufacturing flexibility of the expression package is improved.

Optionally, after step 107, the subtitles may also be added to the emoticon generated by using the third image by using the principle of the method shown in fig. 8 and fig. 9, which is not described herein again, and for details, refer to the foregoing.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Corresponding to the method provided by the embodiment of the present invention, referring to fig. 13, a block diagram of an embodiment of an expression package generating device according to the present invention is shown, and the expression package generating device specifically includes the following modules:

the judging module 131 is configured to judge whether a playing parameter of the target video meets a first preset condition;

a first identifying module 132, configured to identify a first image matched with a target bullet screen in the target video if the playing parameter of the target video meets the first preset condition, where the target bullet screen is a bullet screen meeting a second preset condition;

a first generating module 133, configured to generate an expression package according to the first image.

Optionally, the apparatus further comprises:

the second identification module is used for identifying a second image of which the expression belongs to a preset expression type in the target video if the playing parameter of the target video meets the first preset condition;

the first acquisition module is used for acquiring a same target image between the first image and the second image;

the first generating module 133 is further configured to generate an expression package according to the target image.

Optionally, the first identification module 132 includes:

the first identification submodule is used for identifying first time point information matched with a target bullet screen in the target video;

optionally, the second identifying sub-module is configured to identify a first image in the target video, where the first image matches the first time point information;

optionally, the apparatus further comprises:

the second acquisition module is used for acquiring second time point information matched with the second image in the target video;

the first obtaining module comprises:

the first obtaining submodule is used for obtaining the same target time point information between the first time point information and the second time point information;

and the second acquisition sub-module is used for screening out a target image matched with the target time point information from the first image or the second image.

Optionally, the first identification submodule includes:

the third obtaining submodule is used for obtaining the bullet screen heat corresponding to each time point in the target video;

and the third identification submodule is used for identifying first time point information corresponding to the bullet screen heat degrees of the first preset quantity respectively according to the bullet screen heat degrees from high to low, or identifying the first time point information corresponding to the bullet screen heat degrees of which the bullet screen heat degrees are greater than the preset heat degree threshold value according to the bullet screen heat degrees from high to low.

Optionally, the third obtaining sub-module includes:

the first obtaining unit is used for obtaining the number M of the barrages corresponding to each time point in the target video, wherein M is a positive integer;

the second acquisition unit is used for acquiring the bullet screen heat of each of the M bullet screens;

and the third acquisition unit is used for acquiring the bullet screen heat corresponding to each time point according to the bullet screen heat of each bullet screen and the bullet screen quantity M.

Optionally, the second obtaining unit includes:

the first obtaining subunit is used for obtaining the user evaluation parameters and the bullet screen contents of each of the M bullet screens;

and the second obtaining subunit is used for obtaining the bullet screen heat of each bullet screen in the M bullet screens according to the user evaluation parameters and the bullet screen contents.

Optionally, the first generating module 133 includes:

the fourth obtaining submodule is used for obtaining the frame number of the first image in the target video;

a fourth identification submodule, configured to identify, according to the frame number, multiple first images with consecutive frame numbers in the first image, and/or a single first image with an isolated frame number;

the generation submodule is used for generating a dynamic picture by using a plurality of continuous first images of the frame number as the expression package; and/or generating a static picture by using the single first image with the isolated frame number as the expression package; and/or generating a video clip by respectively matching a plurality of target original frame images with a plurality of continuous first images of the frame number to serve as the expression package.

Optionally, the apparatus further comprises:

a third obtaining module, configured to obtain a second preset number of original frame images, which are arranged in the target video and are before a frame number corresponding to the first image in the expression package and are adjacent to the first image;

the fourth acquisition module is used for acquiring the subtitle data matched with the original frame image;

the third identification module is used for identifying a conversation scene according to the caption data;

the second generation module is used for generating text data according to the conversation scene;

and the first adding module is used for adding the text data to the expression package.

Optionally, the apparatus further comprises:

a fifth obtaining module, configured to obtain the first image corresponding to the expression package and target bullet screen content matched in the target video;

the third generation module is used for generating text data according to the target bullet screen content;

and the second adding module is used for adding the text data to the expression package.

Optionally, the apparatus further comprises:

the fourth identification module is used for identifying a third image of which the expression belongs to a preset expression type in the target video if the playing parameter of the target video does not meet the first preset condition;

and the fourth generation module is used for generating the expression package according to the third image.

Optionally, the determining module 131 includes:

the first judgment submodule is used for judging whether the online time of the target video is greater than or equal to a preset time threshold value or not when the playing parameter comprises the online time;

if the online time of the target video is greater than or equal to a preset time threshold, the playing parameter of the target video meets a first preset condition;

if the online time of the target video is less than a preset time threshold, the playing parameter of the target video does not meet a first preset condition.

Optionally, the determining module 131 includes:

the second judging submodule is used for judging whether the number of the barrages of the target video is greater than or equal to a preset number threshold value and whether the stability of the number of the barrages meets a preset stability condition when the playing parameters comprise the number of the barrages;

if the number of the barrage of the target video is greater than or equal to a preset number threshold and the stability of the number of the barrage meets a preset stability condition, the playing parameter of the target video meets a first preset condition;

if the number of the barrage of the target video is smaller than a preset number threshold, or the stability of the number of the barrage does not meet a preset stability condition, the playing parameter of the target video does not meet a first preset condition.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

According to still another embodiment of the present invention, there is also provided an electronic apparatus including: the system comprises a memory, a processor and an expression package generation program stored on the memory and capable of running on the processor, wherein the expression package generation program realizes the steps of the expression package generation method according to any one of the above embodiments when being executed by the processor.

According to still another embodiment of the present invention, there is also provided a computer-readable storage medium having an emoticon generation program stored thereon, the emoticon generation program, when executed by a processor, implementing the steps of the emoticon generation method according to any one of the above-mentioned embodiments.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The method for generating an emoticon, the apparatus for generating an emoticon, the electronic device, and the computer-readable storage medium provided by the present invention are described in detail above, and specific examples are applied herein to illustrate the principles and embodiments of the present invention, and the above descriptions of the embodiments are only used to help understand the method and the core ideas of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. An expression package generation method, comprising:

generating an expression package according to the first image;

acquiring a second preset number of original frame images which are arranged in the target video and are in front of the frame number corresponding to the first image in the expression package and are adjacent to the first image;

acquiring subtitle data matched with the original frame image;

identifying a conversation scene according to the subtitle data;

generating text data according to the conversation scene;

adding the text data to the emoticon.

2. The method of claim 1,

if the playing parameter of the target video meets the first preset condition, the method further comprises: identifying a second image of which the expression belongs to a preset expression type in the target video; acquiring a same target image between the first image and the second image;

the generating of the expression package according to the first image comprises:

and generating an expression package according to the target image.

3. The method of claim 2,

the identifying a first image in the target video that matches a target barrage includes: identifying first time point information matched with a target bullet screen in the target video; identifying a first image in the target video matched with the first time point information;

before the acquiring the same target image between the first image and the second image, the method further comprises: acquiring second time point information matched with the second image in the target video;

the acquiring the same target image between the first image and the second image comprises: acquiring target time point information which is the same between the first time point information and the second time point information; and screening out a target image matched with the target time point information from the first image or the second image.

4. The method of claim 3, wherein the identifying the first time point information that the target barrage matches in the target video comprises:

acquiring bullet screen heat corresponding to each time point in the target video;

according to the order of bullet screen heat degree from high to low, the first time point information that the bullet screen heat degree of discerning first preset quantity corresponds respectively, or, according to bullet screen heat degree from high to low order, the first time point information that bullet screen heat degree that discernment bullet screen heat degree is greater than the bullet screen heat degree that predetermines the heat degree threshold value corresponds.

5. The method of claim 4, wherein the obtaining of the bullet screen heat corresponding to each time point in the target video comprises:

acquiring the number M of bullet screens corresponding to each time point in the target video, wherein M is a positive integer;

acquiring the bullet screen heat degree of each of the M bullet screens;

and acquiring the bullet screen heat corresponding to each time point according to the bullet screen heat of each bullet screen and the bullet screen quantity M.

6. The method of claim 5, wherein said obtaining the bullet screen heat of each of the M bullet screens comprises:

acquiring user evaluation parameters and bullet screen contents of each of M bullet screens;

and acquiring the bullet screen heat degree of each bullet screen in the M bullet screens according to the user evaluation parameters and the bullet screen contents.

7. The method of claim 1, wherein generating an emoticon from the first image comprises:

acquiring the frame number of the first image in the target video;

according to the frame number, a plurality of first images with continuous frame numbers in the first images and/or a single first image with isolated frame numbers are identified;

generating a dynamic picture by using a plurality of continuous first images of the frame number to serve as the expression package; and/or generating a static picture by using the single first image with the isolated frame number as the expression package; and/or generating a video clip by respectively matching a plurality of target original frame images with a plurality of continuous first images of the frame number to serve as the expression package.

8. The method of claim 1, wherein after generating the emoticon from the first image, the method further comprises:

acquiring the first image corresponding to the expression package, and acquiring the matched target bullet screen content in the target video;

generating text data according to the target bullet screen content;

adding the text data to the emoticon.

9. The method according to claim 1, wherein after determining whether the playing parameter of the target video satisfies a first preset condition, the method further comprises:

if the playing parameter of the target video does not meet the first preset condition, identifying a third image of which the expression belongs to a preset expression type in the target video;

and generating an expression package according to the third image.

10. The method according to claim 1, wherein when the playing parameter includes an online duration, the determining whether the playing parameter of the target video satisfies a first preset condition includes:

judging whether the online time of the target video is greater than or equal to a preset time threshold;

and if the online time of the target video is less than a preset time threshold, the playing parameter of the target video does not meet a first preset condition.

11. The method according to claim 1, wherein when the playing parameter includes the number of barrage, the determining whether the playing parameter of the target video satisfies a first preset condition includes:

judging whether the number of the barrages of the target video is greater than or equal to a preset number threshold value and whether the stability of the number of the barrages meets a preset stability condition;

if the number of the barrage of the target video is smaller than a preset number threshold value, or the stability of the number of the barrage does not meet a preset stability condition, the playing parameter of the target video does not meet a first preset condition.

12. An expression package generation apparatus, comprising:

the first generation module is used for generating an expression package according to the first image;

the device further comprises:

13. The apparatus of claim 12, further comprising:

the first generation module is further used for generating an expression package according to the target image.

14. The apparatus of claim 13,

the first identification module comprises:

the second identification submodule is used for identifying a first image matched with the first time point information in the target video;

the device further comprises:

the first obtaining module comprises:

15. The apparatus of claim 14, wherein the first identification submodule comprises:

16. The apparatus of claim 15, wherein the third acquisition submodule comprises:

17. The apparatus of claim 16, wherein the second obtaining unit comprises:

18. The apparatus of claim 12, wherein the first generating module comprises:

19. The apparatus of claim 12, further comprising:

20. The apparatus of claim 12, further comprising:

21. The apparatus of claim 12, wherein the determining module comprises:

22. The apparatus of claim 12, wherein the determining module comprises:

23. An electronic device, comprising: a memory, a processor and an emoticon generation program stored on the memory and executable on the processor, the emoticon generation program when executed by the processor implementing the steps of the emoticon generation method of any of claims 1 to 11.

24. A computer-readable storage medium, characterized in that an emoticon generation program is stored thereon, which when executed by a processor implements the steps in the emoticon generation method according to any one of claims 1 to 11.