CN111695540B

CN111695540B - Video frame identification method, video frame clipping method, video frame identification device, electronic equipment and medium

Info

Publication number: CN111695540B
Application number: CN202010554591.2A
Authority: CN
Inventors: 周杰; 王长虎
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2020-06-17
Filing date: 2020-06-17
Publication date: 2023-05-30
Anticipated expiration: 2040-06-17
Also published as: CN111695540A

Abstract

The embodiment of the disclosure provides a video frame identification method, a video frame cutting device, electronic equipment and a video frame cutting medium. The identification method comprises the following steps: acquiring a target video with a frame; extracting frames from the target video to obtain a plurality of frame images; identifying a candidate frame set of the target video according to the plurality of frame images; and selecting at least one candidate frame from the candidate frame set as a real frame of the target video according to the position relation of each candidate frame on the corresponding frame image. The video frame identification method disclosed by the embodiment of the invention can identify and cut the real frames of the video from a plurality of video frames according to the position relation of the video frames on the corresponding frame images, and has the advantages of high identification precision, high efficiency, less time consumption and the like.

Description

Video frame identification method, video frame clipping method, video frame identification device, electronic equipment and medium

Technical Field

The disclosure relates to the technical field of video processing, and in particular relates to a video frame recognition method, a video frame clipping method, a video frame recognition device, a video frame clipping device, an electronic device and a computer readable storage medium.

Background

Generally, when a user uploads a video, sometimes, in order to adapt to the requirement of a playing window, for example, an original video originally meeting the requirement of horizontal screen viewing is converted into a new video needing to meet the requirement of vertical screen viewing, at this time, a user or a platform is required to add some frames to the original video to obtain the new video, and the size proportion of the new video is changed. Conventionally, adding a frame to an original video generally includes adding a gaussian blur frame, or a solid color frame, or a still picture frame to the original video.

However, in some application scenarios (such as deduplication, for example), we need to find and remove some frames added in the video, and in related art, manually removing some frames in the video by using some video editing software is mostly used for removing the frames in the video. The method for removing the frame consumes a great deal of manpower and time, and simultaneously has the problems of non-uniform standard and unqualified quality of the removed frame.

Disclosure of Invention

The present disclosure aims to solve at least one of the technical problems in the prior art, and provides a video frame recognition method, a video frame clipping method, a video frame recognition device, a video frame clipping device, an electronic device, and a computer-readable storage medium.

In one aspect of the present disclosure, a method for identifying a video frame is provided, including:

acquiring a target video with a frame;

extracting frames from the target video to obtain a plurality of frame images;

identifying a candidate frame set of the target video according to the plurality of frame images;

and selecting at least one candidate frame from the candidate frame set as a real frame of the target video according to the position relation of each candidate frame on the corresponding frame image.

In some optional embodiments, the selecting at least one candidate frame from the candidate frame set as the real frame of the target video according to the positional relationship of each candidate frame on the corresponding frame image includes:

determining a first vertical distance between one side of each candidate frame facing the frame image edge and the corresponding frame image edge;

and if the first vertical distance of at least two candidate frames is smaller than a preset first threshold value, selecting at least one candidate frame from the at least two candidate frames as a real frame of the target video.

In some optional embodiments, the selecting at least one candidate frame from the at least two candidate frames as the real frame of the target video includes:

And if the first vertical distance of one of the at least two candidate frames is smaller than the first vertical distances of the other candidate frames, taking the one candidate frame as the real frame of the target video.

determining the coincidence degree between each candidate frame and the rest candidate frames on the frame image according to the position relation of each candidate frame on the corresponding frame image;

and if the contact ratio of at least two candidate frames is smaller than a preset second threshold value, selecting at least one candidate frame from the at least two candidate frames as a real frame of the target video.

determining a second vertical distance between one side of each of the at least two candidate frames, which faces away from the frame image edge, and the frame image edge;

And if the second vertical distance of one of the candidate frames is smaller than the second vertical distances of the rest of the candidate frames, taking the one of the candidate frames as the real frame of the target video.

In some optional embodiments, the identifying the candidate frame set of the target video according to the plurality of frame images includes:

identifying a first candidate frame set of the target video from the plurality of frame images, wherein the first candidate frame comprises a Gaussian blur frame and/or a solid color frame;

identifying a second candidate frame set of the target video from the plurality of frame images, wherein the second candidate frame comprises a static frame;

and merging the first candidate frame set and the second candidate frame set to obtain the candidate frame set.

In some optional implementations, the identifying the first candidate frame set of the target video from the plurality of frame images includes:

and identifying a first candidate frame set of the target video from the plurality of frame images by adopting a pre-trained frame detection model.

In some optional implementations, the identifying the second candidate frame set of the target video from the plurality of frame images includes:

And identifying a second candidate frame set of the target video from the plurality of frame images by adopting a frame difference method.

In some optional embodiments, the selecting at least one candidate bounding box from the set of candidate bounding boxes as the real bounding box of the target video further includes:

determining whether text information exists in the at least one candidate frame;

and according to a text information operation request of a user, adjusting the candidate frames with the text information, and obtaining the adjusted real frames of the target video, wherein the text information operation request comprises the text information reserved by the candidate frames and/or the text information discarded by the candidate frames.

In another aspect of the present disclosure, there is also provided a clipping method for a video frame, including:

identifying the real frame of the target video according to the video frame identification method recorded in the foregoing;

and cutting the real frame.

In another aspect of the present disclosure, there is also provided a video frame recognition apparatus, including:

the acquisition module is used for acquiring the target video with the frame;

the frame extraction module is used for extracting frames of the target video to obtain a plurality of frame images;

The frame identification module is used for identifying a candidate frame set of the target video according to the plurality of frame images;

and the selection module is used for selecting at least one candidate frame from the candidate frame set as the real frame of the target video according to the position relation of each candidate frame on the corresponding frame image.

In some optional embodiments, the selecting module includes a determining sub-module and a selecting sub-module, where the selecting at least one candidate frame from the candidate frame set as a real frame of the target video according to a positional relationship of each candidate frame on the corresponding frame image includes:

the determining submodule is used for determining a first vertical distance between one side, facing the frame image edge, of each candidate frame and the corresponding frame image edge;

and the selecting submodule is used for selecting at least one candidate frame from the at least two candidate frames as the real frame of the target video if the first vertical distance of the at least two candidate frames is smaller than a preset first threshold value.

And the selecting submodule is used for taking one of the at least two candidate frames as the real frame of the target video if the first vertical distance of the one of the at least two candidate frames is smaller than the first vertical distances of the other candidate frames.

the determination submodule is further used for determining the contact ratio between each candidate frame and the rest candidate frames on the frame image according to the position relation of each candidate frame on the corresponding frame image;

and the selecting submodule is used for selecting at least one candidate frame from the at least two candidate frames as the real frame of the target video if the overlap ratio of the at least two candidate frames is larger than a preset second threshold value.

The determining submodule is used for determining a second vertical distance between one side, away from the frame image edge, of each candidate frame in the at least two candidate frames and the frame image edge;

and the selecting submodule is used for taking one of the candidate frames as the real frame of the target video if the second vertical distance of the candidate frame is smaller than the second vertical distances of the rest of the candidate frames.

In some optional embodiments, the frame recognition module further includes a first recognition sub-module, a second recognition sub-module, and a merging sub-module, where the recognizing the candidate frame set of the target video according to the plurality of frame images includes:

the first recognition submodule is used for recognizing a first candidate frame set of the target video from the plurality of frame images, wherein the first candidate frame comprises a Gaussian blur frame and/or a pure color frame;

the second identifying sub-module is configured to identify a second candidate frame set of the target video from the plurality of frame images, where the second candidate frame includes a static frame;

and the merging submodule is used for merging the first candidate frame set and the second candidate frame set to obtain the candidate frame set.

In some alternative embodiments, the first recognition submodule recognizes a first set of candidate frames of the target video from the plurality of frame images using a pre-trained frame detection model.

In some alternative embodiments, the second identifying submodule identifies a second set of candidate frames of the target video from the plurality of frame images using a frame difference method.

In some optional embodiments, the selecting module includes an adjusting sub-module, the selecting at least one candidate bounding box from the set of candidate bounding boxes as the real bounding box of the target video, further including:

the determining submodule is used for determining whether text information exists in the at least one candidate frame;

the adjusting sub-module is used for adjusting the candidate frames with the text information according to the text information operation request of the user, so as to obtain the adjusted real frames of the target video, wherein the text information operation request comprises the text information reservation of the candidate frames and/or the text information discarding of the candidate frames.

In another aspect of the present disclosure, there is also provided a video frame cropping device, the device comprising:

The identification module is used for identifying the real frame of the video according to the video frame identification method recorded in the foregoing;

and the cutting module is used for cutting the real frame.

In another aspect of the present disclosure, there is also provided an electronic device including:

one or more processors;

and the storage unit is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors can realize the video frame identification method or the video frame clipping method.

In another aspect of the disclosure, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the video border identification method or the video border clipping method described above.

According to the video frame identification method, the video frame identification device, the video frame identification electronic device, the video frame identification medium, the video frame identification device, the video frame cutting method, the video frame identification device, the video frame cutting electronic device and the video medium, real frames of videos can be identified and cut from a plurality of video frames according to the position relation of the video frames on the corresponding frame images, and the video frame identification method, the video frame identification device, the video frame cutting electronic device and the video frame identification medium have the advantages of being high in identification precision, high in efficiency, less in time consumption and the like.

Drawings

FIG. 1 is a schematic block diagram of an example electronic device for implementing a video bezel identification method and clipping method, apparatus, according to an embodiment of the disclosure;

FIG. 2 is a flowchart illustrating a video frame recognition method according to another embodiment of the disclosure;

FIG. 3 is a schematic diagram of candidate frames in a frame image according to another embodiment of the disclosure;

FIG. 4 is a schematic diagram of candidate frames in a frame image according to another embodiment of the disclosure;

FIG. 5 is a schematic diagram of candidate frames in a frame image according to another embodiment of the disclosure;

FIG. 6 is a schematic diagram of candidate frames in a frame image according to another embodiment of the disclosure;

fig. 7 is a flowchart illustrating a step S140 according to another embodiment of the disclosure;

fig. 8 is a flowchart illustrating a step S140 according to another embodiment of the disclosure;

fig. 9 is a flowchart illustrating step S130 according to another embodiment of the disclosure;

fig. 10 is a flowchart illustrating a step S140 according to another embodiment of the disclosure;

FIG. 11 is a schematic diagram of a candidate frame with text information in a frame image according to another embodiment of the disclosure;

fig. 12 is a schematic diagram of a real frame obtained after text information is retained in a frame image according to another embodiment of the disclosure;

FIG. 13 is a flowchart illustrating a method for cropping a video frame according to another embodiment of the disclosure;

Fig. 14 is a schematic structural diagram of a video frame recognition device according to another embodiment of the disclosure;

fig. 15 is a schematic structural diagram of a video frame cropping device according to another embodiment of the disclosure.

Detailed Description

In order that those skilled in the art will better understand the technical solutions of the present disclosure, the present disclosure will be described in further detail with reference to the accompanying drawings and detailed description.

First, an example electronic device for implementing the video frame recognition method, clipping method, and apparatus of an embodiment of the present disclosure is described with reference to fig. 1.

As shown in fig. 1, electronic device 300 includes one or more processors 310, one or more storage devices 320, input devices 330, output devices 340, etc., interconnected by a bus system and/or other forms of connection mechanisms 350. It should be noted that the components and structures of the electronic device shown in fig. 1 are exemplary only and not limiting, as the electronic device may have other components and structures as desired.

The processor 310 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.

The storage 320 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by a processor to perform client functions and/or other desired functions in the disclosed embodiments (implemented by the processor) as described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer readable storage medium.

The input device 330 may be a device used by a user to input instructions, and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.

The output device 340 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

Next, a video frame recognition method according to another embodiment of the present disclosure will be described with reference to fig. 2.

As shown in fig. 2, a video frame recognition method S100 includes:

s110: and acquiring the target video with the frame.

Specifically, in this step, a target video with a frame is selected from a plurality of target videos to be identified according to actual requirements. For example, one or more target videos may be selected from candidate videos for which a frame needs to be identified, for example, according to a user's request to identify the frame. In addition, for another example, one or more target videos may be selected from some candidate videos according to a system instruction, and the like, which may be specifically determined according to actual needs, and the embodiment of the disclosure is not limited thereto.

S120: and extracting frames from the target video to obtain a plurality of frame images.

Specifically, in this step, the frame extraction process may be performed on the target video using an equal interval frame extraction method or an unequal interval frame extraction method. For example, when the equal interval frame extraction method is adopted, one frame image may be extracted from the target video at equal intervals or equal frames, for example, one frame image is extracted at 15s intervals, or one frame image is extracted at 15 frames per interval, or the like. When the non-equidistant frame extraction method is adopted, one frame of image can be extracted from the target video at different intervals or different frames, for example, the number of the frames at intervals of two adjacent frames can be sequentially increased or sequentially decreased or random numbers are adopted. Of course, other frame extraction methods may be adopted by those skilled in the art to extract a plurality of frame images from the target video according to actual needs, which is not limited in the embodiments of the present disclosure.

S130: and identifying a candidate frame set of the target video according to the plurality of frame images.

Specifically, in this step, candidate frames of each frame image may be respectively identified by an image identification method, so as to form a candidate frame set of the target video. The image recognition method comprises a pre-trained frame detection model, a color difference method, laplacian transformation, a frame difference method and the like. Of course, other methods for identifying the candidate frame set from the plurality of frame images may be used by those skilled in the art according to actual needs, and the embodiments of the present disclosure are not limited thereto.

Furthermore, in a frame of image, the locations at which the candidate borders are identified as being present may be located at different areas of the frame of image. Illustratively, as shown in fig. 3, there are three candidate borders and one image area on the frame image, where the candidate border B is located at an upper edge area of the frame image, the candidate border a and the candidate border C are located at a lower edge area of the frame image, and the image area S1 is located at a central area of the frame image. Still another example, as shown in fig. 4, there are three candidate borders and one image area on the frame image, where candidate border D and candidate border E are located at a lower edge area of the frame image, candidate border F is located at an upper edge area of the frame image, and image area S2 is located at a central area of the frame image. Still another example, as shown in fig. 5, there are two candidate borders and one image area on the frame image, wherein the candidate border I and the candidate border J are located at left and right edge areas of the frame image, respectively, and the image area S3 is located at a central area of the frame image. Still another example, as shown in fig. 6, there are four candidate frames and one image area on the frame image, wherein the candidate frame K, the candidate frame L, the candidate frame M, and the candidate frame N are located at four edge areas of the frame image, i.e., upper, lower, left, and right, respectively, and the image area S4 is located at a central area of the frame image. Of course, the positions of the candidate frames present on the frame image are not limited thereto, and for example, the positions of the candidate frames may be located at only one edge position of the frame image, such as an upper edge region or the like, or the positions of the candidate frames may be located at the upper edge region and the left edge region of the frame image or the like.

S140: and selecting at least one candidate frame from the candidate frame set as a real frame of the target video according to the position relation of each candidate frame on the corresponding frame image.

Specifically, in this step, exemplarily, assuming that a plurality of frame images (for example, 80% of all frame images) exist and the existence of a candidate frame is detected at the same position, it may be determined that a real frame exists at the position of the target video. In addition, as shown in fig. 3, the candidate frame a and the candidate frame C are detected at the upper and lower edge regions of the frame image, and at this time, the candidate frame C may not be the real frame of the target video, and the candidate frame C needs to be removed according to the positional relationship between the candidate frame C and the candidate frame a on the frame image, so as to select the candidate frame a as the real frame of the target video. Of course, other ways of determining the real frame of the target video may be selected by those skilled in the art according to the positional relationship of the candidate frame on the corresponding frame image, which is not limited in the embodiment of the present disclosure.

According to the video frame identification method, the real frames of the target video can be accurately selected from the plurality of candidate frames by comparing the position relations of the plurality of candidate frames on the corresponding frame images, and the video frame identification precision is improved. In addition, the position relation of the images can be obtained through the coordinates of the images, the operation of comparing the position relation is simple, and the efficiency of identifying the video frames can be effectively improved.

According to the foregoing, as shown in fig. 3 to 6, there may be candidate borders on the frame image at different edge areas, each of the candidate borders is a square frame having a certain size, and there may be a certain distance between the candidate border and the edge of the frame image. In addition, the detected candidate frame on the frame image is not necessarily the actual frame existing on the target video, and as illustrated in fig. 3, two candidate frames, namely, a candidate frame a and a candidate frame C, are identified at the lower edge area in the frame image, it is obvious that the candidate frame C belongs to a frame that is detected by mistake, and the position of the candidate frame identified by mistake is not limited to this. As another example, as shown in fig. 4, the lower edge area in the frame image also identifies two candidate frames, namely, the candidate frame D and the candidate frame E, and obviously, the candidate frame E should be screened out, that is, the candidate frame E also belongs to the false detected candidate frame.

The following describes, as a specific example, how to eliminate the false detection of the candidate frame to obtain the real frame, but the embodiments of the present disclosure are not limited thereto.

First, a frame selection process of how to eliminate candidate frame false detection to obtain a real frame is described with fig. 3 as a specific example.

As shown in fig. 7, step S140 specifically includes:

s141: and determining a first vertical distance between one side of each candidate frame facing the frame image edge and the corresponding frame image edge.

Specifically, in this step, in conjunction with fig. 3, the side of the candidate frame a facing the frame image edge has the first vertical distance L1 from the frame image edge, and the side of the candidate frame C facing the frame image edge has the first vertical distance L2 from the frame image edge, it should be understood that, in practice, the first vertical distance L1 between the candidate frame a and the frame image edge is zero, that is, one side of the candidate frame a coincides with the frame image edge.

Further, in this step, there is no limitation as to how to obtain the first vertical distance between the candidate frame and the frame image edge. The first vertical distance may be obtained from coordinates of frame image edge pixels of the candidate frame toward the frame image edge side and coordinates of frame image edge pixels, for example. Of course, other ways of obtaining the first vertical distance may be selected by those skilled in the art according to actual needs, and the embodiments of the present disclosure are not limited thereto.

It should be understood that, since the frame image has a plurality of edges, in this step, the frame image edge is the frame image edge closest to the candidate frame, that is, as shown in fig. 3, the distance between the lower edge of the candidate frame and the lower edge of the frame image is the first vertical distance, and is not the distance between the upper edge of the candidate frame C and the upper edge of the frame image, for the candidate frame C.

S142: and if the first vertical distance of at least two candidate frames is smaller than a preset first threshold value, selecting at least one candidate frame from the at least two candidate frames as a real frame of the target video.

Specifically, in this step, the preset first threshold value is set according to practical situations, and may be a specific value, for example, 0.5cm, 1cm, 1.5cm, and the like; the frame image size may be a fixed ratio value, for example, 0.5%, 1%, 2% of the image size, and if the candidate frame is an upper and lower frame, the first threshold is a fixed ratio value of the frame image height size, if the candidate frame is a left and right frame, the first threshold is a fixed ratio value of the frame image width size, for example, if the candidate frame is an upper and lower frame, the first threshold is 4.8dpi, and if the candidate frame is a left and right frame, the first threshold is 6.4dpi.

Illustratively, step S142 specifically includes:

Specifically, as shown in fig. 3, a first vertical distance L1 is provided between the candidate frame a and the frame image edge, and a first vertical distance L2 is provided between the candidate frame C and the frame image edge, where it is obvious that the first vertical distance L1 is smaller than the first vertical distance L2, and the candidate frame a is taken as the real frame of the target video.

According to the video frame identification method, when the first vertical distances of the plurality of candidate frames meet the preset first threshold condition, the position relation between the candidate frames and the frame image edges is further judged by comparing the first vertical distances of the plurality of candidate frames, the candidate frame closest to the frame image edges is selected as the real frame of the target video, false detection candidate frames far from the frame image edges in the candidate frames are effectively identified, and accuracy of video frame identification is improved.

Next, a frame selection process of how to eliminate the candidate frame false detection to obtain a real frame will be described with fig. 4 as another specific example.

As shown in fig. 8, step S140 specifically includes:

s143: and determining the coincidence degree between each candidate frame and the rest candidate frames on the frame image according to the position relation of each candidate frame on the corresponding frame image.

Specifically, in this step, the positional relationship may be obtained according to the pixel coordinates of each candidate frame on the corresponding frame image, and the overlap ratio may be determined by comparing the overlap ratio of the pixel coordinates between the plurality of candidate frames, respectively. Of course, other ways of calculating the overlap ratio between the candidate frames may be selected by those skilled in the art according to actual needs, which is not limited in the implementation of the present disclosure.

S144: and if the overlap ratio of at least two candidate frames is larger than a preset second threshold, selecting at least one candidate frame from the at least two candidate frames as a real frame of the target video.

Specifically, in this step, it is determined whether the overlapping proportion of the pixel coordinates is greater than a preset second threshold, and at least one candidate frame is selected from the at least two candidate frames as a real frame of the target video. The preset second threshold value may be, for example, 90%, 80%, 60%, etc. according to a value set by an actual situation, and in this embodiment, the preset second threshold value takes 80%.

For example, as shown in fig. 4, the overlap ratio between the candidate frame D and the candidate frame E is higher, and the overlap ratio between the candidate frame D and the candidate frame E has exceeded the second threshold, where at least one candidate frame needs to be selected from the candidate frames D and the candidate frame E as the real frame of the target video.

Specifically, if the overlap ratio of at least two candidate frames is greater than a preset second threshold, determining a second vertical distance between one side, away from the frame image edge, of each candidate frame in the at least two candidate frames and the frame image edge.

Illustratively, in this step, the second vertical distance may be obtained according to coordinates of a frame edge pixel on a side of the candidate frame facing away from the frame image edge and coordinates of the frame image edge pixel. Of course, other ways of obtaining the second vertical distance may be selected by those skilled in the art according to actual needs, and the embodiments of the present disclosure are not limited thereto.

Illustratively, as shown in fig. 4, a side of the candidate frame D away from the frame image edge has a second vertical distance L8 from the frame image edge, and a side of the candidate frame E away from the frame image edge has a second vertical distance L7 from the frame image edge. Further, in this step, there is no limitation as to how to obtain the second vertical distance between the candidate frame and the frame image edge. For example, the second vertical distance may also be obtained based on coordinates of edge pixels of the frame on a side of the candidate frame facing away from the edge of the frame image and coordinates of edge pixels of the frame image. Of course, other ways of obtaining the second vertical distance may be selected by those skilled in the art according to actual needs, and the embodiments of the present disclosure are not limited thereto.

It should also be appreciated that, since the frame image has a plurality of edges, in this step, the frame image edge is the frame image pixel closest to the candidate frame, that is, as shown in fig. 4, the distance between the upper edge of the candidate frame and the lower edge of the frame image is the second vertical distance for the candidate frame D.

Specifically, as shown in fig. 4, a second vertical distance L8 is provided between the candidate frame D and the frame image edge, and a second vertical distance L7 is provided between the candidate frame E and the frame image edge, where the second vertical distance L8 is smaller than the second vertical distance L7, and the candidate frame D is taken as the real frame of the target video.

According to the video frame identification method, under the condition that the coincidence ratio of the plurality of candidate frames meets the preset second threshold value, the real frames are further selected by comparing the second vertical distances of the plurality of candidate frames, namely, the candidate frames close to the edges of the frame images are selected from the plurality of candidate frames with higher coincidence ratio to serve as the real frames, so that the frame identification efficiency is ensured, and meanwhile, the identification accuracy is improved.

As shown in fig. 9, step S130 specifically includes:

s131: and identifying a first candidate frame set of the target video from the plurality of frame images, wherein the first candidate frame comprises a Gaussian blur frame and/or a solid color frame.

Specifically, in this step, an exemplary pre-trained frame detection model may be used to identify the first candidate frame set of the target video from the multiple frame images, where the frame detection model may accurately identify a gaussian blur frame and a pure color frame on the frame images, and how to train to obtain the frame detection model is not limited. Of course, other ways of obtaining the frame detection model may be selected by those skilled in the art according to actual needs, and the embodiment of the present disclosure is not limited thereto.

S132: and identifying a second candidate frame set of the target video from the plurality of frame images, wherein the second candidate frame comprises a static frame.

Specifically, in this step, an exemplary pre-trained frame detection model may be used to identify a first candidate frame set of the target video from the plurality of frame images, where the frame difference method specifically includes comparing pixel gray levels of two adjacent frame images to obtain the same region of the two frame images, and using the same region as a static frame region. In addition, those skilled in the art may select other methods to identify the second candidate frame according to actual needs, which is not limited by the embodiments of the present disclosure.

S133: and merging the first candidate frame set and the second candidate frame set to obtain the candidate frame set.

Specifically, in this step, the first candidate frame set and the second candidate frame set obtained through recognition by different methods are combined to obtain the candidate frame set.

According to the video frame identification method, frames in a plurality of frame images are identified by using a plurality of different methods, so that a plurality of types of frames can be accurately identified, the situations that the candidate frames are not fully identified and the real frames selected from the candidate frames are incorrect due to poor matching applicability of the identification method and the identified images are avoided, and the accuracy of video frame identification is improved.

It should be appreciated that, to simplify the workload of identifying candidate frames from the multi-frame images, the multi-frame images may be first all sequentially input to the frame detection model to detect the gaussian blurred frame and the solid color frame. And then, recognizing the static frame from the multi-frame image by adopting a frame difference method, wherein it is not easy to understand that in the recognizing process of the static frame, the position where the Gaussian blur frame or the pure color frame exists in one frame of image is not detected by adopting the frame difference method, obviously, the efficiency of detecting the static frame by adopting the frame difference method can be greatly improved, and the workload of recognizing the static frame by adopting the frame difference method can be greatly reduced.

In some possible embodiments, the candidate frames identified in step S130 may have the situation shown in fig. 11 in addition to those shown in fig. 3 to 6, that is, the candidate frames identified in step S130 may have text information, as shown in fig. 11, in the candidate frames F identified at the lower edge region in the frame image, such as "to feel a high-end player wonderful operation".

How to recognize whether the candidate frames have text information or not and what process is performed on the frames having text information in order to obtain a real frame of the target video that meets the user's expectations will be described in detail below.

As shown in fig. 10, step S140 specifically further includes:

s145: determining whether text information exists in the at least one candidate frame.

Specifically, in this embodiment, the text information may be caption information, for example, in a movie, the caption may be displayed in real time in an edge area of a frame image. For another example, the text information may be bullet screen information, for example, in a movie, bullet screen information input by the user, such as "666, too wonderful", "i like to eat bean curd with strong stink", etc., is displayed at a certain area of the frame image. For these text information existing in the frame image, a mature text recognition method may be used to determine whether text information exists in the candidate frame, such as an OCR text recognition method, which is not limited by the embodiments of the disclosure.

S146: and according to a text information operation request of a user, adjusting the candidate frames with the text information, and obtaining the adjusted real frames of the target video, wherein the text information operation request comprises the text information reserved by the candidate frames and/or the text information discarded by the candidate frames.

Specifically, in this step, for the candidate frames in which the text information exists, the user may wish to detect that the text information does not exist in the real frames as shown in fig. 12, and of course, the user may wish to detect that the text information exists in the real frames as shown in fig. 11. Thus, according to the two choices of the user, a corresponding text information operation request may be generated, which reserves text information for the candidate frame, as corresponding to the case shown in fig. 11, and discards text information for the candidate frame, as corresponding to the case shown in fig. 12. The method for receiving the text information operation request of the user is not limited, and the receiving may be implemented by a keyboard, a mouse, a touch display screen, a voice device, etc., but the embodiment of the disclosure is not limited thereto. In addition, in addition to generating the text information operation request of the user according to the user selection, the text information operation request of the user may be generated by setting a default rule, for example, the default candidate border discards text information or the candidate border retains text information.

Illustratively, as shown in fig. 11, two candidate frames and one image area S5 exist in the frame image, text information G exists in the candidate frame F, and text information does not exist in the candidate frame O. The frame O does not have text information, the adjustment is not performed in the step, when the text information operation request of the user reserves text information for the candidate frame, the candidate frame F is not adjusted, and the candidate frame F and the candidate frame O can be directly used as real frames of the target video.

Otherwise, when the text information operation request of the user is that the text information is discarded for the candidate frame, the candidate frame O is still not adjusted, but the candidate frame F needs to be adjusted. At this time, the candidate frame F may be reduced until no text information G exists in the candidate frame F, as shown in fig. 12, to obtain the adjusted real frame H of the target video.

According to the video frame identification method, the size of the candidate frames is selectively adjusted through reservation and abandonment of the text information, so that the size of the real frame of the target video can be adjusted adaptively according to the operation request of the text information, and the practicability and the user friendliness of frame identification are improved.

Next, a video frame cropping method S200 according to another embodiment of the present disclosure is described with reference to fig. 13, where the method includes:

s210: and identifying the real frame of the target video by adopting a video frame identification method.

Specifically, in this step, the real frame existing in the target video may be identified by using the video frame identification method described above, and the description thereof may be referred to herein.

S220: and cutting the real frame.

Specifically, in this step, there is no limitation on how to cut the real frame, and, by way of example, after the real frame of the target video is detected, the size of the target video may be automatically adjusted according to the real frame size, so that a new video without a frame may be generated.

According to the video frame clipping method, the video frame identification method is adopted, and the real frames of the target video can be accurately determined from the candidate frames by comparing the position relations of the candidate frames on the corresponding frame images. Therefore, after the real frames are cut, accurate new video without frames can be obtained, and the cutting precision of the frames of the target video is improved.

Next, a video frame recognition apparatus 100 according to another embodiment of the present disclosure is described with reference to fig. 14, and the apparatus may be applied to the video frame recognition method described above, and the detailed description thereof will be omitted herein. The device includes an acquisition module 110, a frame extraction module 120, a frame identification module 130, and a selection module 140, specifically:

the acquiring module 110 is configured to acquire a target video with a frame.

The frame extraction module 120 is configured to extract frames from the target video to obtain a plurality of frame images.

The frame recognition module 130 is configured to recognize a candidate frame set of the target video according to the plurality of frame images.

The selecting module 140 is configured to select, according to a positional relationship of each candidate frame on the corresponding frame image, at least one candidate frame from the candidate frame set as a real frame of the target video.

According to the video frame recognition device disclosed by the embodiment of the invention, the real frames of the target video are selected from the plurality of candidate frames by comparing the position relations of the plurality of candidate frames on the corresponding frame images, so that the real frames can be accurately selected from the plurality of candidate frames, and the video frame recognition precision is improved. In addition, the position relation of the images can be obtained through the coordinates of the images, the operation of comparing the position relation is simple, and the efficiency of identifying the video frames can be effectively improved.

Illustratively, as shown in fig. 14, the selecting module 140 includes a determining sub-module 141 and a selecting sub-module 142, where selecting at least one candidate frame from the candidate frame set as a real frame of the target video according to a positional relationship of each candidate frame on the corresponding frame image includes:

the determining submodule 141 is configured to determine a first vertical distance between a side, facing the frame image edge, of each candidate frame and the corresponding frame image edge;

the selecting sub-module 142 is configured to select at least one candidate frame from the at least two candidate frames as a real frame of the target video if the first vertical distance between the at least two candidate frames is less than a preset first threshold.

Illustratively, as shown in fig. 14, the selecting at least one candidate frame from the at least two candidate frames as the real frame of the target video includes:

the selecting sub-module 142 is configured to take one of the at least two candidate frames as a real frame of the target video if the first vertical distance of the one of the at least two candidate frames is smaller than the first vertical distances of the other candidate frames.

As shown in fig. 14, the selecting, according to the positional relationship of each candidate frame on the corresponding frame image, at least one candidate frame from the candidate frame set as the real frame of the target video includes:

the determining submodule 141 is further configured to determine, according to a positional relationship of each candidate frame on the corresponding frame image, a contact ratio between each candidate frame and the rest candidate frames on the frame image;

the selecting sub-module 142 is configured to select at least one candidate frame from the at least two candidate frames as a real frame of the target video if the overlap ratio of the at least two candidate frames is greater than a preset second threshold.

the determining submodule 141 is configured to determine a second vertical distance between a side, away from the frame image edge, of each of the at least two candidate frames and the frame image edge;

the selecting sub-module 142 is configured to take one of the candidate frames as the real frame of the target video if the second vertical distance of the one of the candidate frames is smaller than the second vertical distances of the other candidate frames.

Illustratively, as shown in fig. 14, the frame identifying module 130 further includes a first identifying sub-module 131, a second identifying sub-module 132, and a merging sub-module 133, where identifying the candidate frame set of the target video according to the plurality of frame images includes:

the first identifying sub-module 131 is configured to identify a first candidate frame set of the target video from the plurality of frame images, where the first candidate frame includes a gaussian blur frame and/or a solid color frame;

the second identifying sub-module 132 is configured to identify a second candidate frame set of the target video from the plurality of frame images, where the second candidate frame includes a static frame;

the merging submodule 133 is configured to merge the first candidate frame set and the second candidate frame set to obtain the candidate frame set.

Illustratively, as shown in fig. 14, the selecting module 140 includes an adjusting sub-module 143, where selecting at least one candidate frame from the candidate frame set as the real frame of the target video includes:

the determining submodule 141 is configured to determine whether text information exists in the at least one candidate frame;

the adjusting sub-module 143 is configured to adjust, according to a text information operation request of a user, a candidate frame in which the text information exists, so as to obtain the adjusted real frame of the target video, where the text information operation request includes that the candidate frame retains the text information and/or that the candidate frame discards the text information.

According to the video frame identification device provided by the embodiment, the real frames of the target video can be selected from the candidate frames by comparing the position relations of the candidate frames on the corresponding frame images, false detection candidate frames far away from the edges of the frame images in the candidate frames can be effectively identified, candidate frames close to the edges of the frame images are selected from the candidate frames with high overlap ratio to serve as the real frames, the size of the real frames of the target video can be adaptively adjusted according to the operation request of the text information, and the video frame identification with high accuracy, high identification efficiency, strong practicability and high user friendliness is achieved.

Next, another embodiment of the disclosure of a video frame cropping device 200 is described with reference to fig. 15, where the device includes an identification module 210 and a cropping module 220, specifically:

the identifying module 210 is configured to identify the real frame of the video according to the video frame identifying method described above.

The clipping module 220 is configured to clip the real frame.

According to the video frame clipping device, the video frame identification device is adopted, the position relations of the candidate frames on the corresponding frame images can be compared, the real frames of the target video can be accurately determined from the candidate frames, and therefore after the real frames are clipped, accurate new video without frames can be obtained, and the clipping precision of the target video frames is improved.

Further, the embodiment also discloses an electronic device, which includes:

one or more processors;

Further, in this embodiment, a computer readable storage medium is further disclosed, on which a computer program is stored, where the computer program, when executed by a processor, can implement the video frame recognition method or the video frame clipping method described above.

Wherein the computer readable medium may be embodied in the apparatus, device, system of the present disclosure or may exist alone.

Wherein the computer readable storage medium may be any tangible medium that can contain, or store a program that can be an electronic, magnetic, optical, electromagnetic, infrared, semiconductor system, apparatus, device, more specific examples of which include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, an optical fiber, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination thereof.

The computer-readable storage medium may also include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein, specific examples of which include, but are not limited to, electromagnetic signals, optical signals, or any suitable combination thereof.

It is to be understood that the above embodiments are merely exemplary embodiments employed to illustrate the principles of the present disclosure, however, the present disclosure is not limited thereto. Various modifications and improvements may be made by those skilled in the art without departing from the spirit and substance of the disclosure, and are also considered to be within the scope of the disclosure.

Claims

1. A method for identifying a video frame, comprising:

acquiring a target video with a frame;

extracting frames from the target video to obtain a plurality of frame images;

selecting at least one candidate frame from the candidate frame set as a real frame of the target video according to the position relation of each candidate frame on the corresponding frame image;

according to the positional relationship of each candidate frame on the corresponding frame image, at least one candidate frame is selected from the candidate frame set as a real frame of the target video, and the method comprises the following steps:

determining the contact ratio between each candidate frame and the rest candidate frames on the frame image according to the position relation of each candidate frame on the corresponding frame image;

And if the overlap ratio of at least two candidate frames is larger than a preset second threshold, selecting at least one candidate frame from the at least two candidate frames as a real frame of the target video.

2. The method of claim 1, wherein the selecting at least one candidate bounding box from the at least two candidate bounding boxes as the true bounding box of the target video comprises:

3. The method of claim 1 or 2, wherein the identifying the candidate set of frames of the target video from the plurality of frame images comprises:

4. The method of claim 3, wherein the identifying the first set of candidate frames of the target video from the plurality of frame images comprises:

5. The method of claim 3, wherein the identifying the second set of candidate frames of the target video from the plurality of frame images comprises:

6. The method of claim 1 or 2, wherein the selecting at least one candidate bounding box from the set of candidate bounding boxes as the true bounding box of the target video further comprises:

7. A method for cropping a video frame, comprising:

the method of any of claims 1-6 identifying a true border of a target video;

and cutting the real frame.

8. A video bezel identification device, comprising:

the acquisition module is used for acquiring the target video with the frame;

the selection module is used for selecting at least one candidate frame from the candidate frame set as the real frame of the target video according to the position relation of each candidate frame on the corresponding frame image;

9. A video bezel clipping apparatus, comprising:

an identification module for identifying a real border of a video according to the method of any one of claims 1-6;

and the cutting module is used for cutting the real frame.

10. An electronic device, comprising:

one or more processors;

a storage unit for storing one or more programs, which when executed by the one or more processors, enable the one or more processors to implement the identification method according to any one of claims 1 to 6 or the clipping method according to claim 7.

11. A computer-readable storage medium having a computer program stored thereon, characterized in that,

the computer program, when executed by a processor, is capable of implementing the identification method according to any one of claims 1 to 6 or the clipping method according to claim 7.