CN114554285B

CN114554285B - Video interpolation processing method, video interpolation processing device and readable storage medium

Info

Publication number: CN114554285B
Application number: CN202210178989.XA
Authority: CN
Inventors: 孙梦笛; 朱丹
Original assignee: BOE Technology Group Co Ltd
Current assignee: BOE Technology Group Co Ltd
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2024-08-02
Anticipated expiration: 2042-02-25
Also published as: WO2023160617A9; CN114554285A; WO2023160617A1; US20240251056A1

Abstract

A video frame inserting processing method, a device and a storage medium. The video frame inserting processing method comprises the following steps: acquiring a first video frame and a second video frame of a video; acquiring a first comparison result between the first video frame and the second video frame based on the first video frame and the second video frame; and determining whether to insert a frame between the first video frame and the second video frame based on the first comparison result. The first video frame and the second video frame are temporally adjacent, the first video frame being a forward frame of the second video frame. The first comparison result indicates whether or not there is a picture switch between the first video frame and the second video frame. According to the video frame inserting processing method, the adjacent video frames are compared to selectively execute frame inserting operation, so that the problem of obvious deformation caused by picture switching in frame inserting processing is effectively avoided, the fluency of the video is ensured, and the watching experience of a user is improved.

Description

Video interpolation processing method, video interpolation processing device and readable storage medium

Technical Field

Embodiments of the present disclosure relate to a video interpolation processing method, a video interpolation processing apparatus, and a non-transitory readable storage medium.

Background

Video processing is a typical application of artificial intelligence, and video frame inserting technology is a typical technology in video processing, and aims to synthesize intermediate video frames with smooth transition according to front and rear video frames in a video segment, so that video playing is smoother, and the viewing experience of a user is improved. For example, the 24-frame rate video can be converted into the 48-frame rate video through the video interpolation process, so that a user can feel that the video is clearer and smoother when watching the video.

Disclosure of Invention

At least one embodiment of the present disclosure provides a video plug-in frame processing method, including: acquiring a first video frame and a second video frame of a video, acquiring a first comparison result between the first video frame and the second video frame based on the first video frame and the second video frame, and determining whether to insert a frame between the first video frame and the second video frame based on the first comparison result. The first video frame and the second video frame are temporally adjacent, the first video frame being a forward frame of the second video frame. The first comparison result indicates whether a picture switch exists between the first video frame and the second video frame.

For example, in a method provided by at least one embodiment of the present disclosure, the picture switching includes subtitle switching and/or scene switching.

For example, in a method provided by at least one embodiment of the present disclosure, based on the first video frame and the second video frame, obtaining the first comparison result between the first video frame and the second video frame includes: determining whether the subtitle switching exists between the first video frame and the second video frame based on whether subtitle contents of the first video frame and the second video frame are the same.

For example, in a method provided in at least one embodiment of the present disclosure, determining whether there is the subtitle switching between the first video frame and the second video frame based on whether subtitle contents of the first video frame and the second video frame are the same includes: acquiring an audio segment corresponding to the first video frame; acquiring a start video frame and an end video frame corresponding to the audio segment based on the audio segment; based on the starting video frame and the ending video frame, it is determined whether the subtitle switch exists between the first video frame and the second video frame.

For example, in a method provided by at least one embodiment of the present disclosure, determining whether the subtitle switching exists between the first video frame and the second video frame based on the start video frame and the end video frame includes: determining that the subtitle switch does not exist between the first video frame and the second video frame in response to the second video frame being between the start video frame and the end video frame; responsive to the second video frame not being between the starting video frame and the ending video frame, determining that the subtitle switch exists between the first video frame and the second video frame.

For example, in a method provided in at least one embodiment of the present disclosure, determining whether there is the subtitle switching between the first video frame and the second video frame based on whether subtitle contents of the first video frame and the second video frame are the same includes: acquiring first identification text content of the first video frame; acquiring second identification text content of the second video frame; in response to the first and second identified text content being the same, determining that the subtitle switch does not exist between the first and second video frames.

For example, in a method provided in at least one embodiment of the present disclosure, determining whether the subtitle switching exists between the first video frame and the second video frame based on whether subtitle contents of the first video frame and the second video frame are the same, further includes: responsive to the first identified text content and the second identified text content being different: acquiring a first sub-image of the first video frame; acquiring a second sub-image of the second video frame, and determining whether the subtitle switching exists between the first video frame and the second video frame based on the first sub-image and the second sub-image. The first sub-image corresponds to first subtitle content of the first video frame; the second sub-image corresponds to second subtitle content of the second video frame.

For example, in a method provided by at least one embodiment of the present disclosure, determining whether the subtitle switching exists between the first video frame and the second video frame based on the first sub-image and the second sub-image includes: determining a first similarity between the first sub-image and the second sub-image based on the first sub-image and the second sub-image; determining that the subtitle switch does not exist between the first video frame and the second video frame in response to the first similarity being greater than a first threshold; responsive to the first similarity not being greater than the first threshold, determining that the subtitle switch exists between the first video frame and the second video frame.

For example, in a method provided by at least one embodiment of the present disclosure, based on the first video frame and the second video frame, obtaining the first comparison result between the first video frame and the second video frame includes: determining whether the scene cut exists between the first video frame and the second video frame based on whether the scenes of the first video frame and the second video frame are the same.

For example, in a method provided by at least one embodiment of the present disclosure, determining whether there is the scene cut between the first video frame and the second video frame based on whether the scenes of the first video frame and the second video frame are the same includes: acquiring a second similarity between the first video frame and the second video frame; determining that the scene cut does not exist between the first video frame and the second video frame in response to the second similarity being greater than a second threshold; in response to the second similarity not being greater than the second threshold, determining that the scene cut exists between the first video frame and the second video frame.

For example, in a method provided by at least one embodiment of the present disclosure, determining whether to insert a frame between the first video frame and the second video based on the first comparison result includes: determining to insert frames between the first video frame and the second video in response to the first comparison result indicating that the picture switch does not exist between the first video frame and the second video frame; in response to the first comparison result indicating that the picture switch exists between the first video frame and the second video frame, determining that no frame is interposed between the first video frame and the second video frame.

For example, in the method provided in at least one embodiment of the present disclosure, further including: and setting a first frame inserting mark, and in response to the picture switching between the first video frame and the second video frame, modifying the first frame inserting mark into the second frame inserting mark.

For example, in the method provided in at least one embodiment of the present disclosure, further including: acquiring a fourth video frame in response to the picture switching between the first video frame and the second video frame; acquiring a second comparison result between the second video frame and the fourth video frame based on the second video frame and the fourth video frame; determining whether to insert a frame between the second video frame and the fourth video based on the second comparison result. The fourth video frame and the second video frame are temporally adjacent, the second video frame being a forward frame of the fourth video frame; the second comparison result indicates whether the picture switching exists between the second video frame and the fourth video frame.

For example, in a method provided by at least one embodiment of the present disclosure, determining whether to insert a frame between the second video frame and the fourth video based on the second comparison result includes: and inserting multi-frame video frames between the second video frame and the fourth video in response to the second comparison result indicating that the picture switching does not exist between the second video frame and the fourth video frame. The number of frames of the multi-frame video frame is based on the second interpolation flag.

For example, in a method provided by at least one embodiment of the present disclosure, determining whether to insert a frame between the second video frame and the fourth video based on the second comparison result includes: determining that no video frame is inserted between the second video frame and the fourth video in response to the second comparison result indicating that the picture switch exists between the second video frame and the fourth video frame; and modifying the second frame inserting mark into a third frame inserting mark, wherein the third frame inserting mark is used for indicating the number of frames of the next frame inserting.

For example, in the method provided in at least one embodiment of the present disclosure, further including: in response to inserting a third video frame between the first video frame and the second video frame, a first sub-image of the first video frame is acquired, a third sub-image of the third video frame is acquired, and based on the first sub-image and the third sub-image, a determination is made as to whether to replace the third video frame with the first video frame. The first sub-image corresponds to first subtitle content in the first video frame and the third sub-image corresponds to third subtitle content in the third video frame.

For example, in a method provided by at least one embodiment of the present disclosure, determining whether to replace the third video frame with the first video frame based on the first sub-image and the third sub-image includes: acquiring a pixel value of a first pixel in the first sub-image; setting a pixel value of a third pixel of the third sub-image based on a pixel value of a first pixel of the first sub-image, and determining whether to replace the third video frame with the first video frame based on the first sub-image and the set third sub-image. The pixel value of the first pixel is larger than a third threshold value; the relative position of the third pixel in the third sub-image is the same as the relative position of the first pixel in the first sub-image.

At least one embodiment of the present disclosure further provides a video plug-in frame processing apparatus, including: the device comprises an acquisition module, a comparison module and an operation module. The acquisition module is configured to acquire a first video frame and a second video frame of a video. The first video frame and the second video frame are temporally adjacent, the first video frame being a forward frame of the second video frame. The comparison module is configured to obtain a first comparison result between the first video frame and the second video frame based on the first video frame and the second video frame. The first comparison result indicates whether a picture switch exists between the first video frame and the second video frame. The operation module is configured to determine whether to insert a frame between the first video frame and the second video frame based on the first comparison result.

At least one embodiment of the present disclosure further provides a video plug-in frame processing apparatus, including: a processor and a memory. The memory includes one or more computer program modules. The one or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules comprising instructions for performing the video plug-in processing method of any of the embodiments described above.

At least one embodiment of the present disclosure also provides a non-transitory readable storage medium having computer instructions stored thereon. The computer instructions, when executed by a processor, perform the video interpolation processing method of any of the embodiments described above.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the following brief description of the drawings of the embodiments will make it apparent that the drawings described below relate only to some embodiments of the present disclosure and are not limiting of the present disclosure.

Fig. 1 is a schematic diagram of a video frame inserting method according to at least one embodiment of the present disclosure;

fig. 2 is a flow chart of a video frame inserting processing method according to at least one embodiment of the present disclosure;

FIG. 3 is a flow chart of a method for determining subtitle switching according to at least one embodiment of the present disclosure;

FIG. 4 is a flow chart of a text recognition method according to at least one embodiment of the present disclosure;

fig. 5 is a flowchart of another method for determining whether subtitles are switched according to at least one embodiment of the present disclosure;

fig. 6 is a schematic block diagram of still another method for determining whether subtitles are switched according to at least one embodiment of the present disclosure;

FIG. 7 is a schematic diagram of another video plug-in processing method according to at least one embodiment of the present disclosure;

FIG. 8 is a schematic flow chart diagram of a post-processing method provided by at least one embodiment of the present disclosure;

FIG. 9 is a schematic diagram of another video plug-in processing method according to at least one embodiment of the present disclosure;

FIG. 10 is a schematic block diagram of yet another video plug-in processing method provided by at least one embodiment of the present disclosure;

FIG. 11 is a schematic block diagram of a video plug-in processing apparatus provided in at least one embodiment of the present disclosure;

FIG. 12 is a schematic block diagram of another video plug-in processing apparatus provided in accordance with at least one embodiment of the present disclosure;

FIG. 13 is a schematic block diagram of yet another video plug-in processing apparatus provided in at least one embodiment of the present disclosure;

FIG. 14 is a schematic block diagram of a non-transitory readable storage medium provided by at least one embodiment of the present disclosure;

Fig. 15 is a schematic block diagram of an electronic device provided in accordance with at least one embodiment of the present disclosure.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present disclosure more apparent, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings. It will be apparent that the described embodiments are some, but not all, of the embodiments of the present disclosure. All other embodiments, which can be made by one of ordinary skill in the art without the need for inventive faculty, are within the scope of the present disclosure, based on the described embodiments of the present disclosure.

A flowchart is used in this disclosure to describe the operations performed by a system according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in order precisely. Rather, the various steps may be processed in reverse order or simultaneously, as desired. Also, other operations may be added to or removed from these processes.

Unless defined otherwise, technical or scientific terms used in this disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which this disclosure belongs. The terms "first," "second," and the like, as used in this disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. Likewise, the terms "a," "an," or "the" and similar terms do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.

Fig. 1 is a schematic diagram of a video frame inserting method according to at least one embodiment of the present disclosure.

As shown in fig. 1, video interpolation techniques are typically intermediate frames between two consecutive frames of a composite video for improving frame rate and enhancing visual quality. In addition, video interpolation techniques may also support various applications such as slow motion generation, video compression, and training data generation for video motion deblurring. For example, a video plug frame may predict an intermediate frame with a light stream prediction algorithm and plug in between two frames. Optical flow, like the flow of light, is a way to represent the direction of movement of an object in an image by color. Optical flow prediction algorithms typically predict a frame in between from two frames of video. After inserting the predicted image, the video appears to be smoother. For example, as shown in fig. 1, intermediate stream information is estimated for two consecutive frames input through the network, a rough result is obtained by inversely warping the input frame, and the result is input to the fusion network together with the input frame and the intermediate stream information, to finally obtain an intermediate frame.

At present, the commonly used video frame inserting algorithm cannot well handle the deformation problem, such as the deformation problem caused by scene switching, subtitle switching and the like of the video. Because most video interpolation algorithms require the use of information of the previous and subsequent frames of the video. When the subtitle, scene, or the like of the preceding and following frames of the video is switched, the optical flow information of the preceding and following frames cannot be estimated accurately, and therefore, significant distortion occurs.

To at least overcome the above-mentioned technical problems, at least one embodiment of the present disclosure provides a video plug-in processing method, which includes: acquiring a first video frame and a second video frame of a video; acquiring a first comparison result between the first video frame and the second video frame based on the first video frame and the second video frame; a determination is made whether to insert a frame between the first video frame and the second video frame based on the first comparison result. The first video frame and the second video frame are temporally adjacent, the first video frame being a forward frame of the second video frame. The first comparison result indicates whether or not there is a picture switch between the first video frame and the second video frame.

Accordingly, at least one embodiment of the present disclosure also provides a video interpolation processing apparatus and a non-transitory readable storage medium corresponding to the above video interpolation processing method.

By the video frame inserting processing method provided by at least one embodiment of the present disclosure, the problem of obvious deformation caused by switching of video frames in frame inserting processing can be solved, and the smoothness of video is ensured, so that the viewing experience of a user is improved.

The layout design method provided according to at least one embodiment of the present disclosure is described below in a non-limiting manner by several examples or embodiments, and as described below, different features in these specific examples or embodiments may be combined with each other without contradiction, thereby obtaining new examples or embodiments, which also fall within the scope of protection of the present disclosure.

Fig. 2 is a flow chart of a video frame inserting processing method according to at least one embodiment of the present disclosure.

At least one embodiment of the present disclosure provides a video plug-in processing method 10, as shown in fig. 2. For example, the video plug-in processing method 10 may be applied to any scene requiring video plug-in, for example, various video products and services such as television shows, movies, documentaries, advertisements, MVs, etc., and may also be applied in other respects, as embodiments of the present disclosure are not limited in this respect. As shown in fig. 2, the video interpolation processing method 10 may include the following steps S101 to S103.

Step S101: first video of acquisition video a frame and a second video frame. The first video frame and the second video frame are temporally adjacent, the first video frame being a forward frame of the second video frame.

Step S102: based on the first video frame and the second video frame, a first comparison result between the first video frame and the second video frame is obtained. The first comparison result indicates whether or not there is a picture switch between the first video frame and the second video frame.

Step S103: a determination is made whether to insert a frame between the first video frame and the second video frame based on the first comparison result.

It should be noted that, in the embodiments of the present disclosure, "first video frame" and "second video frame" are used to refer to any two temporally consecutive or adjacent two-frame images or video frames in a video or video frame sequence. The "first video frame" is used to refer to a previous frame image of two frame images adjacent in time, the "second video frame" is used to refer to a subsequent frame image of two frame images adjacent in time, and the "third video frame" is used to refer to an intermediate frame or an interposed frame of one frame interposed between two frame images adjacent in time. The "first video frame", "second video frame", and "third video frame" are not limited to a particular frame image, nor to a particular order. The "first comparison result" is used to refer to a comparison result between two adjacent frames of images in a video, and is not limited to a specific one of the comparison results, nor to a specific order. It should be further noted that, in the embodiment of the present disclosure, the forward frames of two adjacent frames may be used as references, and the backward frames of two adjacent frames may also be used as references, so long as the frames remain consistent in the whole video frame interpolation processing method.

For example, in at least one embodiment of the present disclosure, for step S102, to avoid distortion problems caused by picture switching of the previous and subsequent frames of the video, adjacent first and second video frames may be compared to determine whether there is a picture switching between the first and second video frames.

For example, in at least one embodiment of the present disclosure, for step S103, it may be determined whether to perform a frame inserting operation between the first video frame and the second video frame based on the first comparison result of the first video frame and the second video frame. For example, in some examples, the interpolation operation may be calculating an intermediate frame/interpolation frame based on adjacent first and second video frames by an optical flow prediction method.

It should be noted that, the method how to obtain the intermediate frame/the inserted frame (i.e., the third video frame) is not particularly limited in the embodiments of the present disclosure, and various conventional frame inserting methods may be used to obtain the third video frame. For example, the intermediate frame/insertion frame may be generated based on two adjacent frames of video frames, may be generated based on more adjacent frames, may be generated based on a specific video frame or specific video frames, and this disclosure is not limited thereto and may be set according to practical situations. For example, in at least one embodiment of the present disclosure, for step S103, determining to insert a frame between the first video frame and the second video may include determining that there is no picture switch between the first video frame and the second video frame in response to the first comparison result. In response to the first comparison result indicating that a picture switch exists between the first video frame and the second video frame, it is determined that no frame is interposed between the first video frame and the second video frame.

Therefore, in the video frame inserting processing method 10 provided in at least one embodiment of the present disclosure, the frame inserting operation is selectively performed according to the comparison result between the adjacent video frames, so as to effectively avoid the problem of obvious deformation caused by the switching of the video frames in the frame inserting process, ensure the smoothness of the video, and thereby improve the viewing experience of the user.

For example, in at least one embodiment of the present disclosure, the picture switching between the first video frame and the second video frame may include subtitle switching, may include scene switching, and the like, to which embodiments of the present disclosure are not limited.

For example, in one example, the subtitle in the first video frame is "where you want to go" and the subtitle in the second video frame is "i prepare to go to school". The subtitle in the first video frame and the subtitle in the second video frame are different, it can be regarded that subtitle switching occurs between the first video frame and the second video. Note that, embodiments of the present disclosure do not limit subtitle content.

For another example, in one example, where the scene in the first video frame is at a mall and the scene in the second video frame is at a school, the scene of the first video frame and the scene of the second video frame are different, it may be considered that a scene switch has occurred between the first video frame and the second video frame. In addition, in the embodiment of the present disclosure, the scenes in each video frame may include any scenes such as a mall, a school, a scenic spot, and the like, which is not limited thereto by the embodiment of the present disclosure.

For example, in at least one embodiment of the present disclosure, for step S102, acquiring a first comparison result between a first video frame and a second video frame based on the first video frame and the second video frame may include: based on whether the caption content of the first video frame and the second video frame are the same, it is determined whether caption switching exists between the first video frame and the second video frame.

For example, in at least one embodiment of the present disclosure, for determining whether subtitle switching occurs between two adjacent frames, two frames of video frames corresponding to audio may be acquired by locating the beginning and ending of each sentence of audio of video, and marking according to time information of the corresponding audio frames, so as to determine whether the corresponding subtitle is cut.

Fig. 3 is an exemplary flowchart of a method for determining subtitle switching according to at least one embodiment of the present disclosure.

For example, in at least one embodiment of the present disclosure, determining whether subtitle switching exists between the first video frame and the second video frame based on whether subtitle contents of the first video frame and the second video frame are the same may include the following steps S201 to S203, as shown in fig. 3.

S201: and acquiring an audio segment corresponding to the first video frame.

S202: based on the audio segment, a start video frame and an end video frame corresponding to the audio segment are acquired.

S203: based on the start video frame and the end video frame, it is determined whether a subtitle switch exists between the first video frame and the second video frame.

It should be noted that, in the embodiments of the present disclosure, "start video frame" and "end video frame" are used to refer to two frames of video frames determined based on time information of corresponding audio segments, and the "start video frame" and the "end video frame" are not limited to a specific video frame nor to a specific order.

For example, in at least one embodiment of the present disclosure, for step S201, the corresponding audio data may be input to a speech recognition system for speech segmentation, to obtain a speech recognition result and corresponding time information. For example, the time information includes a start time and an end time of the corresponding audio piece. Based on the speech recognition result and the corresponding time information, an audio segment corresponding to the first video frame can be obtained.

For example, in at least one embodiment of the present disclosure, for step S202, a start video frame and an end video frame corresponding to an identified corresponding audio segment may be determined from the time information of the audio segment.

It should be noted that the embodiments of the present disclosure are not limited to the voice recognition method, and any effective voice recognition method may be used.

For example, in at least one embodiment of the present disclosure, for step S203, it may include: in response to the second video frame being between the start video frame and the end video frame, determining that there is no subtitle switch between the first video frame and the second video frame, and in response to the second video frame being not between the start video frame and the end video frame, determining that there is a subtitle switch between the first video frame and the second video frame.

For example, in at least one example of the present disclosure, one video includes a sequence of video frames, e.g., including video frame 1, video frame 2, video frame 3, video frame 4, video frame 5 … … that are temporally adjacent assuming that the first video frame is video frame 2, the audio segment to which the first video frame corresponds is "where you want to go", determining that the starting video frame to which the audio segment corresponds is video frame 1, and the ending video frame is video frame 4 based on the temporal information (e.g., the start time and the end time of a sentence) of the audio segment. In this case, it is explained that the subtitles displayed on the pictures from the video frame 1 to the video frame 4 are "where you want to go", i.e., the same subtitle content is displayed. For example, assuming that the second video frame is video frame 3, between video frame 1 and video frame 4, there is no subtitle switching between the first video frame and the second video frame. For another example, assuming that the second video frame is video frame 5, and is not between video frame 1 and video frame 4, a subtitle switch occurs between the first video frame and the second video frame. Through the above operation, which video frames have subtitle switching can be determined through the audio corresponding to the video.

For example, in at least one embodiment of the present disclosure, for determining whether subtitle switching occurs between adjacent video frames, a text recognition method may be used in addition to the determination by audio. For example, in some examples, caption content displayed on the first video frame and the second video frame is obtained by employing a text recognition algorithm, and a comparison is made to determine whether caption switching has occurred between the first video frame and the second video frame. It should be noted that, the embodiment of the present disclosure does not specifically limit the text recognition algorithm, as long as the text content can be recognized.

Fig. 4 is a flow chart illustrating a text recognition method according to at least one embodiment of the present disclosure.

For example, in at least one embodiment of the present disclosure, as shown in fig. 4. Through a text recognition algorithm, coordinates of a text can be obtained in addition to acquiring the content of the recognized text. For example, in some examples, the text coordinates obtained may be coordinates of four vertex positions of an upper left, lower left, upper right, and lower right of a full caption. For example, in some examples, text detection may be performed on an input image (or a single-frame video), an area where text is located is determined, each word is then separately segmented, then a single word classifier (e.g., an algorithm based on text feature vector correlation, an algorithm based on a neural network, etc.) is used to complete classification of a single word (the word is considered to be a word if the confidence level is greater than a certain threshold), and finally a recognition result of the text and its coordinates are output. It should be noted that, the embodiments of the present disclosure do not limit the specific operation of the text recognition method, and any effective text recognition method may be used.

For example, in at least one embodiment of the present disclosure, for determining whether subtitle switching occurs between adjacent frames (a first video frame and a second video frame) of a video, it may include: and acquiring first identification text content of the first video frame, acquiring second identification text content of the second video frame, and determining that subtitle switching does not exist between the first video frame and the second video frame in response to the first identification text content being identical to the second identification text content.

It should be noted that, in the embodiment of the present disclosure, "first recognition text content" and "second recognition text content" are used to refer to recognition text content obtained by performing a text recognition operation on a corresponding video frame. The "first recognition text content" and the "second recognition text content" are not limited to a specific text content nor to a specific order.

For example, in at least one embodiment of the present disclosure, in order to more accurately recognize subtitles, the range of application of the text recognition operation may be set in advance. Since the display position of the subtitle in the video picture is generally fixed, the approximate area where the subtitle is located can be set in advance.

Fig. 5 is a flowchart of another method for determining subtitle switching according to at least one embodiment of the present disclosure.

In general, the text recognition algorithm cannot achieve 100% accuracy, for example, the text segmentation result is not completely accurate, which causes other problems. For example, in some examples, a font at a location other than the subtitle is identified that results in a failure of the text sequences identified by the preceding and following frames to match, and so on. In order to more accurately determine whether subtitles are switched, the video interpolation processing method 10 provided in the embodiment of the present disclosure may include the following steps S301 to S303, as shown in fig. 5.

Step S301: a first sub-image of a first video frame is acquired in response to the first identified text content and the second identified text content being different. The first sub-picture corresponds to first subtitle content of the first video frame.

Step S302: a second sub-image of a second video frame is acquired, the second sub-image corresponding to second subtitle content of the second video frame.

Step S303: based on the first sub-image and the second sub-image, it is determined whether a subtitle switch exists between the first video frame and the second video frame.

It should be noted that, in the embodiments of the present disclosure, "first subtitle content" and "second subtitle content" are used to refer to subtitle content displayed in a corresponding video frame, respectively. The "first subtitle content" and the "second subtitle content" are not limited to specific subtitle contents, nor to the order of characteristics.

It should be further noted that, in the embodiment of the present disclosure, the "first sub-image", "second sub-image", and "third sub-image" are used to refer to an image of an area where a subtitle is located in a corresponding video frame, respectively. The "first sub-image", "second sub-image", and "third sub-image" are not limited to a specific image, nor to a specific order.

For example, in at least one embodiment of the present disclosure, a text recognition operation is performed on a certain video frame, coordinates of subtitles in the video frame (e.g., coordinates of top left, bottom left, top right, bottom right of a complete subtitle) are recognized, and based on the coordinates, an area where the subtitles in the video frame are located may be obtained, so as to obtain a sub-image of the video frame corresponding to the subtitle content.

For example, in at least one embodiment of the present disclosure, for step S303, it may include: determining a first similarity between the first sub-image and the second sub-image based on the first sub-image and the second sub-image; determining that subtitle switching does not exist between the first video frame and the second video frame in response to the first similarity being greater than a first threshold; responsive to the first similarity not being greater than a first threshold, it is determined that a subtitle switch exists between the first video frame and the second video frame.

It should be noted that, in the embodiments of the present disclosure, the "first similarity" is used to refer to an image similarity between subtitle sub-images of two adjacent frames of video frames. "second similarity" is used to refer to the similarity of images between two adjacent frames of video frames. The "first similarity" and the "second similarity" are not limited to a specific similarity, nor to a specific order.

It should be further noted that, in the embodiment of the present disclosure, the values of the "first threshold", the second threshold "and the" third threshold "are not limited, and may be set according to actual requirements. The "first threshold", the "second threshold", and the "third threshold" are not limited to certain specific values, nor to a particular order.

For example, in embodiments of the present disclosure, the image similarity between two images may be calculated using various methods. For example, by cosine similarity algorithms, histogram algorithms, perceptual hash algorithms, mutual information based algorithms, etc. The embodiment of the disclosure does not limit the method for calculating the image similarity, and can be selected according to actual requirements.

For example, in at least one embodiment of the present disclosure, a structural similarity (Structural Similarity, SSIM) algorithm may be employed to calculate the similarity between two images. For SSIM, it is a full-reference image quality evaluation index that measures image similarity in terms of brightness, contrast, and structure, respectively. The formula for calculating SSIM is as follows:

wherein mu _x represents the average value of x, mu _y represents the average value of y, The variance of x is represented as x,Representing the variance of y, σ _xy represents the covariance of x and y. c ₁＝(k₁L)²,c₂＝(k₂L)² denotes a constant for maintaining stability. L represents the dynamic range of the pixel value. k ₁＝0.01,k₂ =0.03. The structural similarity has a value ranging from-1 to 1. The larger the value, the less the image distortion. When the two images are identical, the value of SSIM is equal to 1.

For example, in at least one embodiment of the present disclosure, the "first threshold" may be set to 0.6, or may be set to 0.8. It should be noted that, the value of the "first threshold" is not limited in the embodiments of the disclosure, and may be set according to actual requirements.

Fig. 6 is a schematic block diagram of yet another method for determining whether subtitles are switched according to at least one embodiment of the present disclosure.

In at least one embodiment of the present disclosure, for example, as shown in figure 6, By performing text recognition operations on the approximate caption region Z ₀ of the first video frame I ₀ and the approximate caption region Z ₁ of the second video frame I ₁ respectively, First text recognition content T ₀ and second recognition text content T ₁ are obtained, as well as corresponding coordinates C ₀ and C ₁. Then, the text similarity between the first text recognition content T ₀ and the second recognition text content T ₁ is calculated to determine whether the first text recognition content T ₀ and the second recognition text content T ₁ are identical. if the similarity is greater than a certain threshold, it is considered that the first text recognized content T ₀ and the second recognized text content T ₁ are identical, that is, the subtitle is not switched. if the similarity is not greater than a certain threshold, The similarity of the first sub-image corresponding to the caption area Z ₀ in the first video frame I ₀ and the second sub-image corresponding to the caption area Z ₁ in the second video frame I ₁ is further determined. for example, as shown in fig. 6, it is determined whether the SSIM of the identified images (i.e., the first sub-image and the second sub-image described above) within the range of coordinates C ₀ and C ₁ is greater than a threshold. If the SSIM is greater than a threshold (e.g., 0.8), it indicates that no switching of subtitles occurs. If the SSIM is not greater than the threshold (e.g., 0.8), it indicates that the subtitle has been switched.

It should be noted that, the embodiments of the present disclosure do not limit the method for calculating the text similarity. For example, the text similarity may be calculated using euclidean distance, manhattan distance, cosine similarity, and the like. It should also be noted that, the threshold value of the text similarity is not limited in the embodiment of the present disclosure, and may be set according to actual requirements.

For example, in at least one embodiment of the present disclosure, a picture switch may include a scene switch in addition to a subtitle switch. For example, for step S102, it may include: based on whether the scenes of the first video frame and the second video frame are the same, it is determined whether there is a scene cut between the first video frame and the second video frame.

For example, in at least one embodiment of the present disclosure, when video involves scene cuts, the image similarity (e.g., SSIM value) of two images before and after each other may be significantly reduced. Therefore, scene cut can be achieved by a method of calculating image similarity.

For example, in at least one embodiment of the present disclosure, for determining whether a scene cut occurs between two adjacent frames of video frames, the following steps may be included: acquiring a second similarity between the first video frame and the second video frame; determining that no scene cuts exist between the first video frame and the second video frame in response to the second similarity being greater than a second threshold; in response to the second similarity not being greater than the second threshold, it is determined that a scene cut exists between the first video frame and the second video frame.

For example, in at least one embodiment of the present disclosure, the second similarity may be a Structural Similarity (SSIM), or may be, for example, a perceptual hash algorithm, a histogram algorithm, or the like to calculate the similarity between pictures (i.e., video frames), and embodiments of the present disclosure do not limit the algorithm for calculating the image similarity.

In the embodiment of the present disclosure, the number of inserted frames is exemplified by 2 times of inserted frames. For example, the frame is inserted from 30fps (frame number per second transmission) to 60fps, i.e. the frame number per second transmission is increased from 30 frames to 60 frames. When scene switching or subtitle switching is detected between two adjacent frames of video frames, the current frame inserting operation is not executed any more, and in order to ensure the consistent frame number, two frames are inserted in the next frame inserting process. For another example, when scene switching and subtitle switching occur twice in succession, this results in two times of non-frame insertion operations, and if only two frames are inserted at the next frame insertion, this results in fewer frames of the overall video.

Fig. 7 is a schematic diagram of another video frame inserting processing method according to at least one embodiment of the present disclosure.

For example, to avoid the occurrence of the above-described few frames, in at least one embodiment of the present disclosure, the video interpolation processing method 10 may include, in addition to steps S10 to S103: setting a first frame inserting mark; in response to a picture switch between the first video frame and the second video frame, the first plug-in flag is modified to a second plug-in flag.

It should be noted that, in the embodiments of the present disclosure, the "first frame-inserting flag", "second frame-inserting flag", and "third frame-inserting flag" refer to frame-inserting flags of different time points or different phases, for indicating how many consecutive frame-switching operations exist in the video. The "first insertion flag", the "second insertion flag", and the "third insertion flag" are not limited to a specific value, nor to a specific order.

For example, in some examples, it is assumed that the video includes a sequence of video frames, e.g., including temporally adjacent video frame 1, video frame 2, video frame 3, video frame 4, video frame 5 … …, e.g., in one example, a plug-in Flag is set, e.g., the plug-in Flag is initialized to (0, 0). Two adjacent video frames (e.g., a first video frame and a second video frame) are input, assuming that the first video frame is video frame 2 and the second video frame is video frame 3. Whether or not there is a picture switching (subtitle switching or scene switching) between the video frame 2 and the video frame 3 is determined by the method described in the above embodiment. If there is a picture switch between video frame 2 and video frame 3, the plug-in Flag is modified from (0, 0) to (0, 1). For example, in some examples, when it is determined that a picture switch occurs between two adjacent frames of video frames, a value of "1" is appended to the insertion Flag (0, 0), and the previous value of "0" is popped, i.e., the updated insertion Flag is (0, 1). When it is determined that no picture switching occurs between two adjacent frames of video frames, a value "0" is added to the insertion frame Flag (0, 0), and the previous value "0" is popped up, i.e., the updated insertion frame Flag is (0, 0).

It should be noted that the frame insertion flag may be initialized to other values, for example, (1, 1), (0, 0), etc., which are not limited by the embodiments of the present disclosure.

For example, in at least one embodiment of the present disclosure, a fourth video frame is acquired in response to a picture switch existing between a first video frame and a second video frame. And acquiring a second comparison result between the second video frame and the fourth video frame based on the second video frame and the fourth video frame. A determination is made as to whether to insert frames between the second video frame and the fourth video based on the second comparison result. The fourth video frame and the second video frame are temporally adjacent, the second video frame being a forward frame of the fourth video frame. The second comparison result indicates whether or not there is a picture switch between the second video frame and the fourth video frame.

For example, in at least one embodiment of the present disclosure, determining whether to insert a frame between a second video frame and a fourth video based on a second comparison result includes: and in response to the second comparison result indicating that no picture switching exists between the second video frame and the fourth video frame, inserting a multi-frame video frame between the second video frame and the fourth video frame. The number of frames of the multi-frame video frame is based on the second interpolation flag.

For example, in at least one embodiment of the present disclosure, determining whether to insert a frame between a second video frame and a fourth video based on a second comparison result includes: determining that no frame is inserted between the second video frame and the fourth video in response to the second comparison result indicating that a picture switch exists between the second video frame and the fourth video frame; and modifying the second plug frame flag to a third plug frame flag. The third frame insertion flag is used to indicate the number of frames of the next frame insertion.

The "fourth video frame" is used to refer to a subsequent frame image temporally adjacent to the "second video frame", and the fourth video frame is not limited to a specific frame image, nor is it limited to a specific order. The "second comparison result" is used to refer to a comparison result between two adjacent frame images (a second video frame and a fourth video frame) in a video, and is not limited to a specific one of the comparison results nor to a specific order.

For example, in some examples, it is assumed that the video includes a sequence of video frames, e.g., including video frame 1, video frame 2, video frame 3, video frame 4, video frame 5 … … that are temporally adjacent assume that the first video frame is video frame 1, the second video frame is video frame 2, and the fourth video frame is video frame 3. As shown in fig. 7, if the video frame 1 and the video frame 2 are input, it is determined that there is a picture switching (subtitle switching or scene switching) between the video frame 1 and the video frame 2, in which case no frame insertion operation is performed between the video frame 1 and the video frame 2, and the frame insertion Flag is set to (0, 1). Then, 2 adjacent video frames, namely, video frame 2 and video frame 3 are input again, and whether or not there is picture switching (subtitle switching or scene switching) between video frame 2 and video frame 3 is judged by the method provided by the above-described embodiment. For example, if it is determined that there is no picture switching between the video frame 2 and the video frame 3, a frame inserting operation is performed between the video frame 2 and the video frame 3. In this case, the frame insertion Flag is (0, 1), which indicates that one picture switching occurs (i.e., no frame is inserted between video frame 1 and video frame 2), and in order to avoid the problem of fewer frames, two frames of video frames need to be inserted between video frame 2 and video frame 3. For another example, if it is determined that there is still a picture switch between video frame 2 and video frame 3, no interpolation operation is performed between video frame 2 and video frame 3. In this case, the insertion Flag is modified from (0, 1) to (1, 1). For example, a value of "1" is added to the insertion Flag (0, 1), and the previous value of "0" is popped up. The insertion Flag (1, 1) may indicate that picture switching has occurred twice consecutively in the video frame sequence. For example, there is a picture switch between video frame 1 and video frame 2, and still there is a picture switch between video frame 2 and video frame 3. For example, by a similar operation, the comparison of video frame 3 and video frame 4 is continued. If there is no picture switching between video frame 3 and video frame 4, a frame insertion operation may be performed. In order to avoid the problem of few frames, it is known that 3 frames of video frames need to be inserted between video frame 3 and video frame 4 based on the insertion flag (1, 1). Therefore, the overall integrity of the video after the frame insertion is ensured.

In practical application, since the frame switching rarely occurs in adjacent video frames of several consecutive frames, the above-mentioned embodiments of the present disclosure take the case that the frame switching occurs 2 consecutive times at most as an example, and initialize the frame insertion flag to (0, 0). Embodiments of the present disclosure are not limited in this regard and may be set according to actual needs.

Fig. 8 is a schematic flow chart of a method for post-processing of an inserted frame according to at least one embodiment of the present disclosure.

For example, in at least one embodiment of the present disclosure, the video plug-in processing method 10 further includes the following steps S401-S403, as shown in fig. 8.

Step S401: in response to inserting the third video frame between the first video frame and the second video frame, a first sub-image of the first video frame is acquired. The first sub-picture corresponds to first subtitle content in the first video frame.

Step S402: a third sub-image of a third video frame is acquired. The third sub-picture corresponds to third subtitle content in a third video frame.

Step S403: based on the first sub-image and the third sub-image, it is determined whether to replace the third video frame with the first video frame.

For example, in at least one embodiment of the present disclosure, for step S403, it may include: acquiring a pixel value of a first pixel in a first sub-image; setting a pixel value of a third pixel of a third sub-image based on a pixel value of a first pixel of the first sub-image; and determining whether to replace the third video frame with the first video frame based on the first sub-image and the set third sub-image. The pixel value of the first pixel is larger than a third threshold value, and the relative position of the third pixel in the third sub-image is the same as the relative position of the first pixel in the first sub-image.

For example, in the embodiment of the present disclosure, the relative position of the third pixel in the third sub-image and the relative position of the first pixel in the first sub-image are the same, and for example, the position coordinates of the first pixel in the coordinate system with the top-left corner vertex of the first sub-image as the origin of coordinates are the same as the position coordinates of the third pixel in the coordinate system with the top-left corner vertex of the third sub-image as the origin of coordinates.

The video interpolation processing method 10 including the operations shown in fig. 8 can solve the problem of distortion caused by the large motion of the caption background in the video interpolation processing, in conjunction with the detailed description of fig. 9. Fig. 9 is a schematic diagram of another video frame inserting processing method according to at least one embodiment of the present disclosure.

For example, in some examples, after inserting the third video frame between the first video frame and the second video frame, in order to improve the frame insertion accuracy, it may be determined whether the subtitles of the first video frame and the third video frame are identical, that is, whether subtitle switching occurs, as shown in fig. 9. For example, the determination may be made by the method of determining whether subtitle switching occurs between adjacent video frames provided in the above-described embodiment. For example, the operation of this part may refer to the related description corresponding to fig. 6, and will not be described herein. For example, after judging that there is no subtitle switching between the first video frame and the third video frame by the method of fig. 6, further processing may be performed.

For example, in some examples, because the color of the caption generally remains stable, e.g., a majority of the caption is white, pixels (i.e., first pixels) in the first sub-image of the first video frame (i.e., the region corresponding to the identified coordinate C ₀) that are greater than a certain threshold (i.e., a third threshold) may be selected. For example, the third threshold is set to 220 and the pixel value range is typically 0-255. The value of the first pixel is assigned to the pixel (i.e., the third pixel) in the third sub-image (i.e., the region corresponding to the identified coordinate C _t) that is co-located with the first pixel. For example, in FIG. 9, the assigned third sub-image is designated C _t'. The distortion of the subtitle is usually significantly beyond the original character due to the large amplitude of the motion of the subtitle background. Therefore, whether the inserted frame subtitle has obvious deformation can be judged by comparing the first sub-image with the assigned third sub-image.

For example, in at least one embodiment of the present disclosure, the first sub-image and the assigned third sub-image are compared, the pixel values of respective corresponding pixels of the first sub-image and the assigned third sub-image are subtracted, and it is determined whether the number of pixels whose absolute value of the pixel difference exceeds a certain threshold (e.g., 150) is greater than another threshold (e.g., 30). If the absolute value of the pixel difference exceeds 150 and the number of pixels is greater than 30, the subtitle of the inserted third video frame is considered to have obvious deformation, and the first video frame is directly copied to replace the deformed inserted frame (i.e. the third video frame). Of course, the morphed interpolated frame (i.e., the third video frame) may also be replaced with the second video frame, as embodiments of the present disclosure are not limited in this respect. Thus, the deformation problem caused by the large movement of the caption background can be avoided.

Fig. 10 is a schematic block diagram of a video frame inserting processing method according to at least one embodiment of the present disclosure.

As shown in fig. 10, the video frame inserting processing method provided in at least one embodiment of the present disclosure not only can solve the deformation problem caused by scene switching and subtitle switching, but also can solve the obvious deformation problem caused by large motion of the subtitle background through post-processing after frame inserting. The operations in the respective blocks of the method described in fig. 10 are described in detail above, and are not described in detail herein.

Therefore, by the video frame inserting processing method 10 provided in at least one embodiment of the present disclosure, the problem of significant deformation caused by the switching of video frames and the large movement of caption backgrounds in frame inserting processing can be solved, so that the smoothness of video is ensured, and the viewing experience of users is improved.

It should also be noted that, in the various embodiments of the present disclosure, the order of execution of the steps of the video interpolation processing method 10 is not limited, and although the execution of the steps is described above in a specific order, this does not constitute a limitation on the embodiments of the present disclosure. The various steps in the video plug-in processing method 10 may be performed serially or in parallel, as may be desired. For example, the video plug-in processing method 10 may also include more or fewer steps, as embodiments of the present disclosure are not limited in this regard.

The at least one embodiment of the present disclosure further provides a video frame inserting processing device, where the video frame inserting processing device may selectively perform frame inserting processing according to a comparison result between adjacent video frames, so as to effectively avoid an obvious deformation problem caused by switching of video frames in frame inserting processing, ensure smoothness of a video, and thereby improve viewing experience of a user.

Fig. 11 is a schematic block diagram of a video plug-in processing device according to at least one embodiment of the present disclosure.

For example, in at least one embodiment of the present disclosure, as shown in fig. 11, a video plug-in processing apparatus 80 includes an acquisition module 801, a comparison module 802, and an operation module 803.

For example, in at least one embodiment of the present disclosure, the acquisition module 801 is configured to acquire a first video frame and a second video frame of a video. The first video frame and the second video frame are temporally adjacent, the first video frame being a forward frame of the second video frame. For example, the obtaining module 801 may implement step S101, and a specific implementation method thereof may refer to a description related to step S101, which is not described herein.

For example, in at least one embodiment of the present disclosure, the comparison module 802 is configured to obtain a first comparison result between the first video frame and the second video frame based on the first video frame and the second video frame. The first comparison result indicates whether or not there is a picture switch between the first video frame and the second video frame. For example, the comparing module 802 may implement step S102, and a specific implementation method thereof may refer to a description related to step S102, which is not described herein.

For example, in at least one embodiment of the present disclosure, the operation module 803 is configured to determine whether to insert a frame between the first video frame and the second video frame based on the first comparison result. For example, the operation module 803 may implement step S103, and a specific implementation method thereof may refer to a description related to step S103, which is not described herein.

It should be noted that, these acquisition module 801, comparison module 802, and operation module 803 may be implemented by software, hardware, firmware, or any combination thereof, for example, may be implemented as the acquisition circuit 801, comparison circuit 802, and operation circuit 803, respectively, and embodiments of the present disclosure are not limited to their specific implementations.

It should be understood that the video frame inserting processing apparatus 80 provided in the embodiments of the present disclosure may implement the foregoing video frame inserting processing method 10, and may also achieve similar technical effects as those of the foregoing video frame inserting processing method 10, which is not described herein.

It should be noted that, in the embodiment of the present disclosure, the apparatus 80 for video interpolation processing may include more or less circuits or units, and the connection relationship between the respective circuits or units is not limited, and may be determined according to actual requirements. The specific configuration of each circuit is not limited, and may be constituted by an analog device, a digital chip, or other suitable means according to the circuit principle.

Fig. 12 is a schematic block diagram of another video plug-in processing apparatus provided in at least one embodiment of the present disclosure.

At least one embodiment of the present disclosure also provides a video plug-in processing apparatus 90. As shown in fig. 12, the video-plug processing apparatus 90 includes a processor 910 and a memory 920. Memory 920 includes one or more computer program modules 921. One or more computer program modules 921 are stored in the memory 920 and configured to be executed by the processor 910, the one or more computer program modules 921 including instructions for performing the video plug processing method 10 provided by at least one embodiment of the present disclosure, which when executed by the processor 910, can perform one or more steps in the video plug processing method 10 provided by at least one embodiment of the present disclosure. The memory 920 and the processor 910 may be interconnected by a bus system and/or other form of connection mechanism (not shown).

For example, the processor 910 may be a Central Processing Unit (CPU), a Digital Signal Processor (DSP), or other form of processing unit having data processing and/or program execution capabilities, such as a Field Programmable Gate Array (FPGA), or the like; for example, the Central Processing Unit (CPU) may be an X86 or ARM architecture, or the like. Processor 910 may be a general purpose processor or a special purpose processor that may control other components in video plug-in processing device 90 to perform the desired functions.

For example, memory 920 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, random Access Memory (RAM) and/or cache memory (cache) and the like. The non-volatile memory may include, for example, read-only memory (ROM), hard disk, erasable programmable read-only memory (EPROM), portable compact disc read-only memory (CD-ROM), USB memory, flash memory, and the like. One or more computer program modules 921 may be stored on the computer readable storage medium, and the processor 910 may execute the one or more computer program modules 921 to implement the various functions of the video plug-in processing device 90. Various applications and various data, as well as various data used and/or generated by the applications, etc., may also be stored in the computer readable storage medium. The specific functions and technical effects of the video plug-in processing apparatus 90 may be referred to the above description of the video plug-in processing method 10, and will not be repeated here.

Fig. 13 is a schematic block diagram of yet another video plug-in processing apparatus 300 provided in at least one embodiment of the present disclosure.

The terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., in-vehicle navigation terminals), and the like, and stationary terminals such as digital TVs, desktop computers, and the like. The video plug-in processing apparatus 300 shown in fig. 13 is merely an example, and should not impose any limitation on the functionality and scope of use of the embodiments of the present disclosure.

For example, as shown in fig. 13, in some examples, video plug-in processing device 300 includes a processing device (e.g., a central processor, a graphics processor, etc.) 301 that may perform various suitable actions and processes in accordance with a program stored in a read-only memory (ROM) 302 or a program loaded from a storage device 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the computer system are also stored. The processing device 301, ROM 302, and RAM 303 are connected thereto via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

For example, the following components may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 308 including, for example, magnetic tape, hard disk, etc.; and a communication device 309 including a network interface card such as a LAN card, a modem, or the like. The communication means 309 can allow the video-plug processing apparatus 300 to perform wireless or wired communication with other devices to exchange data, performing communication processing via a network such as the internet. The drive 310 is also connected to the I/O interface 305 as needed. A removable medium 311 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 310, so that a computer program read therefrom is installed as needed into the storage device 308. While fig. 13 illustrates a video plug-in processing device 300 including various devices, it should be understood that not all illustrated devices are required to be implemented or included. More or fewer devices may be implemented or included instead.

For example, the video plug-in frame processing apparatus 300 may further include a peripheral interface (not shown in the figure), and the like. The peripheral interface may be various types of interfaces, such as a USB interface, a lightning (lighting) interface, etc. The communication means 309 may communicate with networks and other devices by wireless communication, such as the internet, intranets and/or wireless networks such as cellular telephone networks, wireless Local Area Networks (LANs) and/or Metropolitan Area Networks (MANs). The wireless communication may use any of a variety of communication standards, protocols, and technologies including, but not limited to, global System for Mobile communications (GSM), enhanced Data GSM Environment (EDGE), wideband code division multiple Access (W-CDMA), code Division Multiple Access (CDMA), time Division Multiple Access (TDMA), bluetooth, wi-Fi (e.g., based on the IEEE 802.11a, IEEE 802.11b, IEEE 802.11g, and/or IEEE 802.11n standards), voice over Internet protocol (VoIP), wi-MAX, protocols for email, instant messaging, and/or Short Message Service (SMS), or any other suitable communication protocol.

For example, the video plug-in frame processing device 300 may be any device such as a mobile phone, a tablet computer, a notebook computer, an electronic book, a game console, a television, a digital photo frame, a navigator, or any combination of data processing devices and hardware, which is not limited in the embodiments of the present disclosure.

For example, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via a communication device 309, or installed from a storage device 308, or installed from a ROM 302. When the computer program is executed by the processing apparatus 301, the video interpolation processing method 10 disclosed in the embodiment of the present disclosure is executed.

It should be noted that the computer readable medium described in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In an embodiment of the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Whereas in embodiments of the present disclosure, the computer-readable signal medium may comprise a data signal propagated in baseband or as part of a carrier wave, with computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.

The computer readable medium may be contained in the video plug-in processing apparatus 300; or may exist alone without being assembled into the video plug-in processing apparatus 300.

Fig. 14 is a schematic block diagram of a non-transitory readable storage medium provided by at least one embodiment of the present disclosure.

Embodiments of the present disclosure also provide a non-transitory readable storage medium. Fig. 14 is a schematic block diagram of a non-transitory readable storage medium according to at least one embodiment of the present disclosure. As shown in fig. 14, a non-transitory readable storage medium 140 has stored thereon computer instructions 111 that when executed by a processor perform one or more steps in the video interpolation processing method 10 as described above.

For example, the non-transitory readable storage medium 140 may be any combination of one or more computer readable storage media, e.g., one computer readable storage medium containing computer readable program code for acquiring first and second video frames of a video, another computer readable storage medium containing computer readable program code for acquiring a first comparison result between the first and second video frames based on the first and second video frames, and yet another computer readable storage medium containing computer readable program code for determining whether to interpolate a frame between the first and second video frames based on the first comparison result. Of course, the various program codes described above may also be stored on the same computer-readable medium, as embodiments of the present disclosure are not limited in this regard.

For example, when the program code is read by a computer, the computer may execute the program code stored in the computer storage medium, performing, for example, the video plug-in processing method 10 provided by any of the embodiments of the present disclosure.

For example, the storage medium may include a memory card of a smart phone, a memory component of a tablet computer, a hard disk of a personal computer, random Access Memory (RAM), read Only Memory (ROM), erasable Programmable Read Only Memory (EPROM), portable compact disc read only memory (CD-ROM), flash memory, or any combination of the foregoing, as well as other suitable storage media. For example, the readable storage medium may also be the memory 920 in fig. 12, and the related description may refer to the foregoing, which is not repeated herein.

The embodiment of the disclosure also provides electronic equipment. Fig. 15 is a schematic block diagram of an electronic device in accordance with at least one embodiment of the present disclosure. As shown in fig. 15, the electronic device 120 may include a video plug-in processing apparatus 80/90/300 as described above. For example, the electronic device 120 may implement the video plug-in processing method 10 provided by any of the embodiments of the present disclosure.

In the present disclosure, the term "plurality" refers to two or more, unless explicitly defined otherwise.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A video plug-in processing method, comprising:

Acquiring a first video frame and a second video frame of a video, wherein the first video frame and the second video frame are adjacent in time domain, and the first video frame is a forward frame of the second video frame;

Based on the first video frame and the second video frame, obtaining a first comparison result between the first video frame and the second video frame, wherein the first comparison result indicates whether picture switching exists between the first video frame and the second video frame;

Determining whether to insert frames between the first video frame and the second video frame based on the first comparison result, wherein the picture switching includes subtitle switching and scene switching,

Based on the first video frame and the second video frame, obtaining the first comparison result between the first video frame and the second video frame includes:

Determining whether the subtitle switching exists between the first video frame and the second video frame based on whether subtitle contents of the first video frame and the second video frame are the same, and determining whether the subtitle switching exists between the first video frame and the second video frame based on whether subtitle contents of the first video frame and the second video frame are the same, comprising:

acquiring an audio segment corresponding to the first video frame;

acquiring a start video frame and an end video frame corresponding to the audio segment based on the audio segment;

Determining whether the subtitle switching exists between the first video frame and the second video frame based on the start video frame and the end video frame,

The determining whether the subtitle switching exists between the first video frame and the second video frame based on the start video frame and the end video frame includes:

Determining that the subtitle switch does not exist between the first video frame and the second video frame in response to the second video frame being between the start video frame and the end video frame;

responsive to the second video frame not being between the starting video frame and the ending video frame, determining that the subtitle switch exists between the first video frame and the second video frame.

2. The method of claim 1, wherein determining whether the subtitle switch exists between the first video frame and the second video frame based on whether subtitle content of the first video frame and the second video frame is the same comprises:

acquiring first identification text content of the first video frame;

acquiring second identification text content of the second video frame;

in response to the first and second identified text content being the same, determining that the subtitle switch does not exist between the first and second video frames.

3. The method of claim 2, wherein determining whether the subtitle switch exists between the first video frame and the second video frame based on whether subtitle content of the first video frame and the second video frame is the same, further comprises:

Responsive to the first identified text content and the second identified text content being different;

acquiring a first sub-image of the first video frame, wherein the first sub-image corresponds to first subtitle content of the first video frame;

acquiring a second sub-image of the second video frame, wherein the second sub-image corresponds to second subtitle content of the second video frame;

based on the first sub-image and the second sub-image, it is determined whether the subtitle switching exists between the first video frame and the second video frame.

4. The method of claim 3, wherein determining whether the subtitle switch exists between the first video frame and the second video frame based on the first sub-image and the second sub-image comprises:

Determining a first similarity between the first sub-image and the second sub-image based on the first sub-image and the second sub-image;

Determining that the subtitle switch does not exist between the first video frame and the second video frame in response to the first similarity being greater than a first threshold;

responsive to the first similarity not being greater than the first threshold, determining that the subtitle switch exists between the first video frame and the second video frame.

5. The method of claim 1, wherein obtaining the first comparison between the first video frame and the second video frame based on the first video frame and the second video frame comprises:

determining whether the scene cut exists between the first video frame and the second video frame based on whether the scenes of the first video frame and the second video frame are the same.

6. The method of claim 5, wherein determining whether the scene cut exists between the first video frame and the second video frame based on whether the scenes of the first video frame and the second video frame are the same comprises:

acquiring a second similarity between the first video frame and the second video frame;

Determining that the scene cut does not exist between the first video frame and the second video frame in response to the second similarity being greater than a second threshold;

In response to the second similarity not being greater than the second threshold, determining that the scene cut exists between the first video frame and the second video frame.

7. The method of claim 1, wherein determining whether to insert frames between the first video frame and the second video based on the first comparison result comprises:

determining to insert frames between the first video frame and the second video in response to the first comparison result indicating that the picture switch does not exist between the first video frame and the second video frame;

In response to the first comparison result indicating that the picture switch exists between the first video frame and the second video frame, determining that no frame is interposed between the first video frame and the second video frame.

8. The method of claim 1, further comprising:

setting a first frame inserting mark;

In response to the picture switch existing between the first video frame and the second video frame, modifying the first plug frame flag to a second plug frame flag.

9. The method of claim 8, further comprising:

Responsive to the picture switching between the first video frame and the second video frame, acquiring a fourth video frame, wherein the fourth video frame and the second video frame are adjacent in time domain, and the second video frame is a forward frame of the fourth video frame;

Acquiring a second comparison result between the second video frame and the fourth video frame based on the second video frame and the fourth video frame, wherein the second comparison result indicates whether the picture switching exists between the second video frame and the fourth video frame;

determining whether to insert a frame between the second video frame and the fourth video based on the second comparison result.

10. The method of claim 9, wherein determining whether to insert frames between the second video frame and the fourth video based on the second comparison result comprises:

and in response to the second comparison result indicating that the picture switching does not exist between the second video frame and the fourth video frame, inserting a plurality of frames of video frames between the second video frame and the fourth video frame, wherein the number of frames of the plurality of frames of video frames is based on the second frame inserting mark.

11. The method of claim 9, determining whether to insert frames between the second video frame and the fourth video based on the second comparison result, comprising:

determining that no frame is inserted between the second video frame and the fourth video in response to the second comparison result indicating that the picture switch exists between the second video frame and the fourth video frame; and

And modifying the second frame inserting mark into a third frame inserting mark, wherein the third frame inserting mark is used for indicating the number of frames of the next frame inserting.

12. The method of claim 1, further comprising:

in response to inserting a third video frame between the first video frame and the second video frame, obtaining a first sub-image of the first video frame, wherein the first sub-image corresponds to first subtitle content in the first video frame;

Acquiring a third sub-image of the third video frame, wherein the third sub-image corresponds to third subtitle content in the third video frame;

Based on the first sub-image and the third sub-image, it is determined whether to replace the third video frame with the first video frame.

13. The method of claim 12, wherein determining whether to replace the third video frame with the first video frame based on the first sub-image and the third sub-image comprises:

acquiring a pixel value of a first pixel in the first sub-image; wherein the pixel value of the first pixel is greater than a third threshold;

Setting a pixel value of a third pixel of the third sub-image based on a pixel value of a first pixel of the first sub-image, wherein a relative position of the third pixel in the third sub-image and a relative position of the first pixel in the first sub-image are the same;

Based on the first sub-image and the set third sub-image, it is determined whether to replace the third video frame with the first video frame.

14. A video plug-in processing apparatus, comprising:

an acquisition module configured to acquire a first video frame and a second video frame of a video, wherein the first video frame and the second video frame are adjacent in a time domain, the first video frame being a forward frame of the second video frame;

A comparison module configured to obtain a first comparison result between the first video frame and the second video frame based on the first video frame and the second video frame, wherein the first comparison result indicates whether or not there is a picture switch between the first video frame and the second video frame;

An operation module configured to determine whether to insert a frame between the first video frame and the second video frame based on the first comparison result,

The picture switching includes subtitle switching and scene switching, and based on the first video frame and the second video frame, obtaining the first comparison result between the first video frame and the second video frame includes:

Determining whether or not the subtitle switching exists between the first video frame and the second video frame based on whether or not subtitle contents of the first video frame and the second video frame are identical,

The determining whether the subtitle switching exists between the first video frame and the second video frame based on whether subtitle contents of the first video frame and the second video frame are the same includes:

acquiring an audio segment corresponding to the first video frame;

determining whether the subtitle switch exists between the first video frame and the second video frame based on the start video frame and the end video frame, and determining whether the subtitle switch exists between the first video frame and the second video frame based on the start video frame and the end video frame, comprising:

15. A video plug-in processing apparatus, comprising:

A processor;

A memory including one or more computer program modules;

Wherein the one or more computer program modules are stored in the memory and configured to be executed by the processor, the one or more computer program modules comprising instructions for performing the video plug-in processing method of any of claims 1-13.

16. A non-transitory readable storage medium having stored thereon computer instructions, wherein the computer instructions, when executed by a processor, perform the video interpolation processing method of any of claims 1-13.