CN113313065A

CN113313065A - Video processing method and device, electronic equipment and readable storage medium

Info

Publication number: CN113313065A
Application number: CN202110700014.4A
Authority: CN
Inventors: 周亮
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2021-06-23
Filing date: 2021-06-23
Publication date: 2021-08-27

Abstract

The embodiment of the invention provides a video processing method, a video processing device, electronic equipment and a readable storage medium, wherein the method comprises the following steps: acquiring a current video and time sequence information of a video frame in the current video; extracting a preset number of video key frames from the current video according to the time sequence information; respectively carrying out image characteristic analysis on the video key frames to obtain corresponding image characteristic analysis results; performing feature fusion on all the image feature analysis results to obtain video feature information for representing the current video; and calculating the similarity between the current video and a preset video based on the video characteristic information, and judging whether the current video is a repeated video. According to the embodiment of the invention, the video key frames are searched in time sequence, redundant segments in the video are removed, and the image expression characteristics of the video key frames are extracted, so that the video feature extraction time is saved while the video representation effectiveness is ensured.

Description

Video processing method and device, electronic equipment and readable storage medium

Technical Field

The present invention relates to the field of multimedia technologies, and in particular, to a video processing method, a video processing apparatus, an electronic device, and a computer-readable storage medium.

Background

The network video industry develops rapidly and is vigorous, and a professional video website becomes an important output platform for the contents of film and television, comprehensive art, sports information and the like, so that the leisure and entertainment modes of social groups are remodeled, and the network video network becomes an indispensable part in modern life.

The short video ecosphere is more and more popular with people, and users can freely upload and download small videos to share daily life. However, confronted with a huge amount of video data, a large part of which belongs to repeated video. The repeated video on the current network mainly comprises: adding watermarks to the video, changing the video format, changing the video frame rate, increasing or decreasing the film heads and the film tails, cutting and splicing different videos and the like. The traditional method for identifying videos by adopting a DHash algorithm (difference hash algorithm) cannot identify various cheating methods of repeated videos.

Disclosure of Invention

In view of the above problems, embodiments of the present invention are proposed to provide a video processing method and a corresponding video processing apparatus, an electronic device, and a computer-readable storage medium that overcome or at least partially solve the above problems.

The embodiment of the invention discloses a video processing method, which comprises the following steps:

acquiring a current video and time sequence information of a video frame in the current video;

extracting a preset number of video key frames from the current video according to the time sequence information;

respectively carrying out image characteristic analysis on the video key frames to obtain corresponding image characteristic analysis results;

performing feature fusion on all the image feature analysis results to obtain video feature information for representing the current video;

and calculating the similarity between the current video and a preset video based on the video characteristic information, and judging whether the current video is a repeated video.

Optionally, the extracting a preset number of video key frames from the current video according to the timing information includes:

acquiring the duration information of the current video;

calculating the ratio of the video length corresponding to the duration information to the preset number, and taking the ratio as a sampling interval;

and according to the time sequence information, the preset number of video key frames are extracted from the current video at equal intervals according to the sampling interval.

Optionally, the performing image feature analysis on the video key frames respectively to obtain corresponding image feature analysis results includes:

and respectively inputting the video key frames into a preset convolutional neural network model to obtain corresponding key frame vector characteristic information, and taking the key frame vector characteristic information as the image characteristic analysis result.

Optionally, the video feature information includes video vector feature information, and the performing feature fusion on all the image feature analysis results to obtain video feature information for characterizing the current video includes:

taking the playing sequence of the preset number of the video key frames in the current video as the splicing sequence of the image feature analysis result;

and sequentially splicing the image feature analysis results according to the splicing sequence to obtain the video vector feature information for representing the current video.

Optionally, the calculating a similarity between the current video and a preset video based on the video feature information, and determining whether the current video is a repeated video includes:

acquiring preset video characteristic information of the preset video;

calculating the cosine distance between the video characteristic information and the preset video characteristic information to obtain a corresponding cosine distance value;

taking the cosine distance value as the video similarity between the current video and the preset video;

and judging whether the current video is a repeated video or not according to the video similarity.

Optionally, the determining whether the current video is a repeated video according to the video similarity includes:

determining a maximum video similarity from the plurality of video similarities;

judging whether the maximum video similarity is greater than a preset similarity threshold value or not;

and if the maximum video similarity is greater than the preset similarity threshold, judging that the current video is a repeated video.

Optionally, the preset video has tag information, and the determining whether the current video is a repeated video further includes:

if the current video is judged to be the repeated video, taking the preset video with the maximum video similarity as a target similar video;

acquiring target label information of the target similar video;

configuring current tag information of the current video as the target tag information;

and selecting one of the videos with the same target label information for video recommendation.

Optionally, the preset convolutional neural network model is a residual error network model; the residual error network model comprises at least one convolution layer and a pooling layer which are connected in series, wherein the convolution layer is used for carrying out feature extraction on an input image; the pooling layer is used for performing average pooling on the image features output by the last convolution layer to obtain image feature vectors with preset dimensions.

The embodiment of the invention also discloses a video processing device, which comprises:

the first acquisition module is used for acquiring a current video and time sequence information of a video frame in the current video;

the extraction module is used for extracting a preset number of video key frames from the current video according to the time sequence information;

the image characteristic analysis module is used for respectively carrying out image characteristic analysis on the video key frames to obtain corresponding image characteristic analysis results;

the characteristic fusion module is used for carrying out characteristic fusion on all the image characteristic analysis results to obtain video characteristic information used for representing the current video;

and the judging module is used for calculating the similarity between the current video and a preset video based on the video characteristic information and judging whether the current video is a repeated video.

Optionally, the extraction module includes:

the first obtaining submodule is used for obtaining the duration information of the current video;

the first calculation submodule is used for calculating the ratio of the video length corresponding to the duration information to the preset number and taking the ratio as a sampling interval;

and the extraction submodule is used for sequentially extracting the preset number of video key frames from the current video at equal intervals according to the sampling interval according to the time sequence information.

Optionally, the image feature analysis module includes:

and the input submodule is used for respectively inputting the video key frames into a preset convolutional neural network model to obtain corresponding key frame vector characteristic information, and taking the key frame vector characteristic information as the image characteristic analysis result.

Optionally, the video feature information includes video vector feature information, and the feature fusion module includes:

the first determining submodule is used for taking the playing sequence of the preset number of the video key frames in the current video as the splicing sequence of the image characteristic analysis result;

and the splicing submodule is used for sequentially splicing the image feature analysis results according to the splicing sequence to obtain the video vector feature information used for representing the current video.

Optionally, the determining module includes:

the second obtaining submodule is used for obtaining preset video characteristic information of the preset video;

the second calculation submodule is used for calculating the cosine distance between the video characteristic information and the preset video characteristic information to obtain a corresponding cosine distance value;

the second determining submodule is used for taking the cosine distance value as the video similarity between the current video and the preset video;

and the judging submodule is used for judging whether the current video is a repeated video according to the video similarity.

Optionally, the determining sub-module includes:

a determining unit, configured to determine a maximum video similarity from the plurality of video similarities;

the first judgment unit is used for judging whether the maximum video similarity is greater than a preset similarity threshold value or not;

and the second judging unit is used for judging that the current video is a repeated video if the maximum video similarity is greater than the preset similarity threshold.

Optionally, the preset video has tag information, and the apparatus further includes:

the determining module is used for taking the preset video with the maximum video similarity as a target similar video if the current video is judged to be the repeated video;

the second acquisition module is used for acquiring target label information of the target similar video;

a configuration module, configured to configure current tag information of the current video as the target tag information;

and the selecting module is used for selecting one of the videos with the same target tag information to recommend the video.

The embodiment of the invention also discloses an electronic device, which comprises: a processor, a memory and a computer program stored on said memory and capable of running on said processor, said computer program, when executed by said processor, implementing the steps of a video processing method as described above.

The embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of the video processing method are realized.

The embodiment of the invention has the following advantages:

in the embodiment of the invention, the video key frames are searched in a time sequence, and the image characteristics of the video key frames are extracted at the same time, so that the image expression characteristics of the video key frames are obtained as the video characteristics to represent the video for identifying the repeated video. By adopting the method, the preset number of video key frames are extracted from the current video to express the video content of the current video, redundant fragments in the video can be removed, the problems of long video feature extraction time and large calculation resource amount are solved by only carrying out image feature analysis on the video key frames instead of carrying out image feature analysis on the whole video, and the video feature extraction efficiency is improved while the video representation effectiveness is ensured and the video identification accuracy rate for identifying the repeated video is improved by taking the image feature analysis results of the preset number of video key frames as the video features of the whole video to judge whether the current video is the repeated video.

Drawings

FIG. 1 is a flow chart of the steps of a video processing method according to an embodiment of the invention;

FIG. 2 is a flow chart of steps of another video processing method of an embodiment of the present invention;

FIG. 3 is a flow chart of a video processing method according to an embodiment of the invention;

fig. 4 is a block diagram of a video processing apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of them. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.

The video sharing life is more and more popular with people, however, most of mass video data on the video website belong to repeated videos, and the video watching experience of a user is influenced by the existence of the repeated videos.

In addition, for the identification of repeated videos, the video copyright is protected, and the pirate is attacked; reducing exposure of the repeating advertisement video; the video recommendation method has important significance in aspects of video recommendation according to video content and the like.

One of the core ideas of the embodiment of the invention is that the video key frames are searched in a time sequence, and image characteristics are extracted aiming at the video key frames, so that the image expression characteristics of the video key frames are obtained as video characteristics to represent the video for identifying repeated videos. By adopting the method, the video key frames with the preset number are extracted from the current video to express the video content of the current video, redundant fragments in the video can be removed, the problems of long video feature extraction time and large calculation resource amount are solved by only carrying out image feature analysis on the video key frames instead of carrying out image feature analysis on the whole video, and the video feature extraction efficiency is improved while the video representation effectiveness is ensured and the video identification accuracy rate for identifying the repeated video is improved by taking the image feature analysis results of the video key frames with the preset number as the video features of the whole video to judge whether the current video is the repeated video.

Referring to fig. 1, a flowchart illustrating steps of a video processing method according to an embodiment of the present invention is shown, which may specifically include the following steps:

step 101, acquiring a current video and time sequence information of a video frame in the current video.

In the embodiment of the invention, a plurality of videos can be acquired, and the video frame timing information of the plurality of videos can be acquired.

And step 102, extracting a preset number of video key frames from the current video according to the time sequence information.

Extracting video key frames from a current video according to time sequence information, wherein the video key frames can represent the video content of the current video, extracting a preset number of video key frames is set for improving the effectiveness of the video key frames for representing the current video, and through experimental verification, 5 video key frames are extracted from each video in practical application to cover the beginning to the end of the whole video.

As to how to extract the video key frames of each video, a variety of extraction methods may be adopted, in one example, the extraction may be started after the video is played for 30 seconds, in another example, the extraction may be performed on the video key frames after a specific element is identified, and the method for extracting the video key frames from the current video may be set according to the actual needs of the user, which is not limited in the embodiments of the present invention.

And 103, respectively carrying out image characteristic analysis on the video key frames to obtain corresponding image characteristic analysis results.

After a preset number of video key frames are extracted, image feature analysis is performed on each video key frame, for example, after 5 video key frames are extracted from a video, image feature analysis needs to be performed on all 5 video key frames, and corresponding 5 image feature analysis results are obtained.

The image feature analysis result refers to image expression features obtained after image feature analysis is performed on key frame images of the video key frames, corresponding image expression features are different according to different adopted image feature analysis methods, the image features comprise color features, texture features, shape features, spatial relationship features and the like of the images, and in the embodiment of the invention, the extracted image features are mainly semantic features of the images.

The image feature analysis may be depth image feature analysis, wherein a Convolutional Neural Network (CNN) may be used for image feature recognition, the Convolutional Neural Network (CNN) is a kind of feed forward Neural network (fed Neural network) that includes convolution calculation and has a depth structure, and is one of the representative algorithms of deep learning (deep learning), the CNN has a good effect in image classification and retrieval, and studies show that images pass through the CNN network, different depths correspond to different image feature information, and at a low level, the images are usually detail features, and at a high level, the images are more semantic features. The high-level semantic features are a good image coding mode and have a good effect on measuring the similarity of images. Therefore, in the embodiment of the present invention, a convolutional neural network may be used to perform depth image feature analysis to obtain the image semantic features of the video keyframes.

And 104, performing feature fusion on all the image feature analysis results to obtain video feature information for representing the current video.

In the embodiment of the invention, the image characteristic analysis results corresponding to each video key frame are subjected to characteristic fusion to obtain the video characteristic information for representing the current video, namely, the image characteristic analysis results of the video key frames are adopted as the video characteristics of the current video, and compared with the traditional method for extracting the video characteristics of the whole video, the extraction efficiency of the video characteristics is improved.

And 105, calculating the similarity between the current video and a preset video based on the video characteristic information, and judging whether the current video is a repeated video.

In the embodiment of the present invention, a video database may be preset, and a plurality of preset videos are stored in the video database, in an example, video feature information of a current video may be compared with video feature information of the preset videos, and if the video feature information of the current video is the same as the video feature information of the preset videos, it may be determined that the current video is a repeated video, otherwise, the current video is not the repeated video. In another example, the similarity between the video feature information of the current video and the video feature information of the preset video may be calculated, and if the similarity is greater than a set threshold, for example, 90%, the current video may be determined to be a repeated video.

In summary, in the embodiment of the present invention, the video key frames are searched in a time sequence, and the image features of the video key frames are extracted at the same time, so as to obtain the image expression features of the video key frames as the video features to represent the video for identifying the repeated video. By adopting the method, the video key frames with the preset number are extracted from the current video to express the video content of the current video, redundant fragments in the video can be removed, the problems of long video feature extraction time and large calculation resource amount are solved by only carrying out image feature analysis on the video key frames instead of carrying out image feature analysis on the whole video, and the video feature extraction efficiency is improved while the video representation effectiveness is ensured and the video identification accuracy rate for identifying the repeated video is improved by taking the image feature analysis results of the video key frames with the preset number as the video features of the whole video to judge whether the current video is the repeated video.

Referring to fig. 2, a flowchart illustrating steps of another video processing method according to an embodiment of the present invention is shown, which may specifically include the following steps:

step 201, obtaining a current video and time sequence information of a video frame in the current video.

In the embodiment of the invention, a batch of current videos can be obtained, wherein the current videos can be advertisement videos, and the time sequence information of the current videos is obtained.

Step 202, extracting a preset number of video key frames from the current video according to the time sequence information.

The video key frames can represent the video content of the current video, and in a preferred embodiment, the following steps can be adopted to obtain the video key frames in the video.

And a substep S11, obtaining duration information of the current video.

The duration information of the current video refers to length information indicating a video length of the current video, and in one example, the duration information may be frame length information of the video (i.e., a total number of video frames).

And a substep S12, calculating a ratio of the video length corresponding to the duration information to the preset number, and taking the ratio as a sampling interval.

And for each video, calculating the ratio of the length of each video to the preset extraction number of the video key frames, and taking the ratio as the sampling interval of the extracted video key frames. For example, if a certain video has a length L and 5 video key frames need to be extracted, the extraction interval of the video key frames of the video is D equal to L/6. In practice, those skilled in the art can set the extraction number and extraction interval of the video key frames by comprehensively considering the effectiveness of the video representation and the calculation efficiency of the video similarity.

And a substep S13, sequentially extracting the preset number of video key frames from the current video at equal intervals according to the time sequence information.

In the embodiment of the invention, equal extraction is carried out on each video according to the sampling interval, and if the sampling interval is D equal to L/6, the L/6, 2L/6, 3L/6, 4L/6 and 5L/6 frames are extracted as the video key frames. For example, if L is 210 frames and D is 35, the corresponding video key frames are extracted as 35 th, 70 th, 105 th, 140 th and 175 th frames.

The method for extracting the video key frames provided in the embodiment of the present invention is only one of the preferred embodiments, and a person skilled in the art may optimize or adjust the method for extracting the video key frames according to actual situations, for example, the number of extracted video key frames may be determined according to the length of a video, or for a certain type of video, only the video key frame at the middle position may be extracted, that is, the video content of a complete video may be represented, at this time, only the video key frame at the middle position may be extracted, and as to what manner the video key frame is extracted, the embodiment of the present invention is not limited.

Step 203, inputting the video key frames into a preset convolutional neural network model respectively to obtain corresponding key frame vector characteristic information, and taking the key frame vector characteristic information as the image characteristic analysis result.

And inputting each obtained video key frame into a pre-trained convolutional neural network model, and outputting key frame vector characteristic information of the video key frame, wherein the key frame vector characteristic information comprises image semantic characteristic information of the video key frame.

In the invention, the preset convolution neural network model can be a residual error network model; the residual network model includes at least one convolutional layer and one pooling layer in series. The convolution layer is used for carrying out feature extraction on an input image; the pooling layer is used for performing average pooling on the image features output by the last convolution layer to obtain image feature vectors with preset dimensions.

In an embodiment, for a preset convolutional neural network model, image feature analysis can be performed by adopting a pre-trained ResNet 18 network (residual network) in consideration of the accuracy and analysis speed of the image feature analysis, the ResNet 18 network cuts an input image of a video key frame to 224 × 224 by default, normalization processing is performed, the image of the video key frame is subjected to network and then the image feature of the last convolutional layer is extracted, the image feature is subjected to avgpoling (average pooling) to obtain a 512-dimensional vector, and the 512-dimensional vector is the image feature analysis result of the corresponding video key frame. In actual operation, for comparison, the output key frame vector feature information of the video key frame of any video after being input into the convolutional neural network model is 512-dimensional vector feature information, which is determined by the structure of the preset convolutional neural network model.

In addition, the above mentioned ResNet 18 network model is a preferred convolutional neural network model, and those skilled in the art may use different models to perform image feature extraction on video keyframes according to different image features that need to be extracted, and the embodiments of the present invention are not limited thereto.

The video feature information is video vector feature information, and after each video obtains a preset number of pieces of key frame vector feature information, each piece of key frame vector feature information can be combined into video vector feature information to represent the video, and the specific method can refer to steps 204 to 205.

And 204, taking the playing sequence of the preset number of the video key frames in the current video as the splicing sequence of the image feature analysis result.

And taking the playing sequence of the video key frames in the current video as the splicing sequence of the corresponding key frame vector characteristic information.

And step 205, sequentially stitching the image feature analysis results according to the stitching sequence to obtain the video vector feature information for representing the current video.

In an embodiment, the obtained 5 video key frames may be passed through a preset convolutional neural network model and then output 5 512-dimensional vectors, which are sequentially spliced to obtain a 2560-dimensional vector as video vector feature information of the current video.

And step 206, calculating the similarity between the current video and a preset video based on the video characteristic information, and judging whether the current video is a repeated video.

In the embodiment of the present invention, the current video may be compared with the preset videos to determine whether the current video is a repeated video of one of the preset videos, a specific comparison method is performed based on the video feature information, and in a preferred example, for step 206, the following steps may be performed:

and a substep S21 of obtaining preset video characteristic information of the preset video.

After the preset video is obtained from the preset video database, the preset video feature information of the preset video can be obtained, it should be noted that the calculation process of the preset video feature information is the same as that of the video feature information of the current video, that is, the preset video feature information is also a 2560-dimensional vector.

And a substep S22, calculating a cosine distance between the video characteristic information and the preset video characteristic information to obtain a corresponding cosine distance value.

In the embodiment of the present invention, the video similarity is calculated by a cosine distance, that is, the cosine distance between the video feature information of the current video and the preset video feature information of the preset video is calculated, that is, the cosine distance between two 2560-dimensional vectors is calculated.

And a substep S23, taking the cosine distance value as the video similarity between the current video and the preset video.

The video similarity between the current video and the preset video is obtained by calculating the cosine distance, i.e. the similarity between different videos, in the embodiment of the present invention, the video similarity is measured by using the cosine distance, and in addition, other methods may be used to measure the video similarity between different videos, which is not limited in the embodiment of the present invention.

And a substep S24, judging whether the current video is a repeated video according to the video similarity.

If the cosine distance value obtained by calculation is larger than a preset threshold value, the current video can be judged to be a repeated video.

Before the video feature extraction method based on the video key frame is used, the online video identification method is to perform frame extraction on the first frame of the video by using a DHash algorithm to obtain video features, and further perform video similarity identification. Randomly selecting 1200 videos on a line, manually checking and grouping, wherein 315 videos have no similar videos and are divided into 315 groups; similar videos exist in the remaining 885 videos, and the videos are divided into 226 groups, wherein the number of the videos in each group in the 226 groups is not less than 2 and not more than 10; the video identification method and the traditional DHash method are respectively used for evaluation, and the grouping is considered to be correct if and only if the grouping situation is completely the same as the real video grouping situation. Compared with the traditional DHash identification method, the method provided by the invention has the advantage that the accuracy is greatly improved.

In addition, for the sub-step S24, how to determine that the current video is the repeated video, the following steps may be specifically performed:

determining a maximum video similarity from the plurality of video similarities; judging whether the maximum video similarity is greater than a preset similarity threshold value or not; and if the maximum video similarity is greater than the preset similarity threshold, judging that the current video is a repeated video.

Calculating the video similarity between the current video and a plurality of preset videos, namely calculating to obtain a plurality of cosine distance values, selecting a maximum cosine distance value from the cosine distance values, comparing the maximum cosine distance value with a preset cosine distance threshold, and if the maximum cosine distance value is greater than the preset cosine distance threshold, judging that the current video is a repeated video of the preset video with the maximum cosine distance value.

In addition, for the repeated videos, the repeated videos may also be grouped, and specifically, the following steps may be performed:

and if the current video is judged to be the repeated video, taking the preset video with the maximum video similarity as a target similar video.

If the current video is determined to be the repeated video, the preset video with the largest cosine distance with the current video is the most similar video of the current video, namely the target similar video.

And acquiring target label information of the target similar video.

In an example, a preset video has tag information, target tag information corresponding to a target similar video is obtained, and the tag information of the preset video may be set according to actual needs of a user, for example, a video tag of a certain preset video may be set as a.

And configuring the current label information of the current video as the target label information.

And configuring the tag information of the current video as the same tag information as the target similar video, namely if the target tag information of the target similar video is A, configuring the current video as A.

By configuring the tags for the current videos, all video data can be grouped according to the tags, in one example, when advertisement video recommendation needs to be performed on a user, one of the advertisement videos can be selected from videos with the same tag to be recommended to the user, so that repeated videos are prevented from being recommended to the user, and the video watching experience of the user on a video website is improved.

Particularly, if a preset video similar to the current video is not found after all preset videos are traversed, that is, the current video is not a repeated video, new tag information may be set for the current video, and the current video is stored in the video database.

In order to enable those skilled in the art to better understand steps 201 to 206 of the embodiment of the present invention, the following description is made by way of an example:

fig. 3 is a flowchart illustrating a video processing method according to an embodiment of the invention. For a video with a frame length of L, 5 video key frames are extracted at equal intervals, namely L/6, 2L/6, 3L/6, 4L/6 and 5L/6 frames, each video key frame is respectively input into a pre-trained convolutional neural network model to perform image feature extraction on each video key frame, output key frame vector feature information is spliced in sequence to obtain video vector feature information, for the video vector feature information, the cosine distance between the video vector feature information and the preset video vector feature information of each preset video in a video library is calculated, the feature with the largest cosine distance is the most similar video feature, the video with the largest cosine distance is the most similar video of the current video, the largest cosine distance value is the similarity between the current video and the most similar video, and if the similarity value is larger than a set threshold (such as 0.96), and if the current video and the video are considered to be the repeated videos, setting the label of the current video as the same label as the most similar video, otherwise, setting a new label for the current video, and updating the current video into a video library.

According to the feature extraction method based on the video key frames, the video key frames are searched in time sequence, redundant segments in the video are removed, the video content is more compactly expressed, meanwhile, the image features are extracted by combining a CNN network, the image expression features of the video key frames can be efficiently obtained, finally, key frame vector feature information of the video key frames is fused to obtain the video features, the problem that the extraction of the video features takes time is solved, and meanwhile, the recognition effect of repeated videos is guaranteed. In addition, the similarity of the features among the videos is calculated based on the video features to set a label for each advertisement video, so that similar advertisement video grouping is achieved, specifically, for a new video feature, the similarity with each video feature of a video feature library is calculated, the maximum similarity is selected as the most similar feature, if the maximum similarity of the two features is larger than a given threshold value, the new video and the most similar video are divided into a group, otherwise, a new group is created for the video, the grouping label is configured for each advertisement video, and then video recommendation is performed according to the labels, so that the situation that a user watches repeated videos can be avoided, and the watching experience of the user on a video platform is improved.

In summary, in the embodiment of the present invention, the video key frames are searched in a time sequence, and the image features of the video key frames are extracted at the same time, so as to obtain the image expression features of the video key frames as the video features to represent the video for identifying the repeated video. By adopting the method, the preset number of video key frames are extracted from the current video to express the video content of the current video, redundant fragments in the video can be removed, the problems of long video feature extraction time and large calculation resource amount are solved by only carrying out image feature analysis on the video key frames instead of carrying out image feature analysis on the whole video, and the video feature extraction efficiency is improved while the video representation effectiveness is ensured and the video identification accuracy rate for identifying the repeated video is improved by taking the image feature analysis results of the preset number of video key frames as the video features of the whole video to judge whether the current video is the repeated video.

It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

Referring to fig. 4, a block diagram of a video processing apparatus according to an embodiment of the present invention is shown, which may specifically include the following modules:

a first obtaining module 401, configured to obtain a current video and timing information of a video frame in the current video;

an extracting module 402, configured to extract a preset number of video key frames from the current video according to the timing information;

an image feature analysis module 403, configured to perform image feature analysis on the video keyframes, respectively, to obtain corresponding image feature analysis results;

a feature fusion module 404, configured to perform feature fusion on all the image feature analysis results to obtain video feature information used for representing the current video;

a determining module 405, configured to calculate a similarity between the current video and a preset video based on the video feature information, and determine whether the current video is a repeated video.

In an embodiment of the present invention, the extraction module includes:

In an embodiment of the present invention, the image feature analysis module includes:

In this embodiment of the present invention, the video feature information includes video vector feature information, and the feature fusion module includes:

In an embodiment of the present invention, the determining module includes:

In an embodiment of the present invention, the determining sub-module includes:

In this embodiment of the present invention, the preset video has tag information, and the apparatus further includes:

In the embodiment of the invention, the preset convolutional neural network model is a residual error network model; the residual error network model comprises at least one convolution layer and a pooling layer which are connected in series, wherein the convolution layer is used for carrying out feature extraction on an input image; the pooling layer is used for performing average pooling on the image features output by the last convolution layer to obtain image feature vectors with preset dimensions.

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

An embodiment of the present invention further provides an electronic device, including: the video processing method comprises a processor, a memory and a computer program which is stored in the memory and can run on the processor, wherein when the computer program is executed by the processor, each process of the video processing method embodiment is realized, the same technical effect can be achieved, and in order to avoid repetition, the details are not repeated.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the above-mentioned video processing method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.

The foregoing detailed description has provided a video processing method, a video processing apparatus, an electronic device, and a computer-readable storage medium, which are provided by the present invention, and the present invention has been described in detail by applying specific examples to explain the principles and embodiments of the present invention, where the descriptions of the above examples are only used to help understand the method and the core ideas of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method of video processing, the method comprising:

2. The method of claim 1, wherein the extracting a preset number of video key frames from the current video according to the timing information comprises:

acquiring the duration information of the current video;

3. The method according to claim 1, wherein the performing image feature analysis on the video key frames respectively to obtain corresponding image feature analysis results comprises:

4. The method according to claim 3, wherein the video feature information includes video vector feature information, and the performing feature fusion on all the image feature analysis results to obtain video feature information for characterizing the current video includes:

5. The method according to claim 1, wherein the calculating the similarity between the current video and a preset video based on the video feature information and determining whether the current video is a repeated video comprises:

acquiring preset video characteristic information of the preset video;

6. The method of claim 5, wherein the determining whether the current video is a repeated video according to the video similarity comprises:

7. The method according to claim 5 or 6, wherein the preset video has tag information, and the determining whether the current video is a repeated video further comprises:

acquiring target label information of the target similar video;

8. The method of claim 3, wherein the preset convolutional neural network model is a residual network model; the residual error network model comprises at least one convolution layer and a pooling layer which are connected in series, wherein the convolution layer is used for carrying out feature extraction on an input image; the pooling layer is used for performing average pooling on the image features output by the last convolution layer to obtain image feature vectors with preset dimensions.

9. A video processing apparatus, characterized in that the apparatus comprises:

10. An electronic device, comprising: processor, memory and computer program stored on the memory and executable on the processor, which computer program, when executed by the processor, carries out the steps of a video processing method according to any one of claims 1 to 8.

11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of a video processing method according to any one of claims 1 to 8.