CN110209879B

CN110209879B - Video playing method, device, equipment and storage medium

Info

Publication number: CN110209879B
Application number: CN201810926642.2A
Authority: CN
Inventors: 张志辉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-08-15
Filing date: 2018-08-15
Publication date: 2023-07-25
Anticipated expiration: 2038-08-15
Also published as: CN110209879A

Abstract

The embodiment of the invention discloses a video playing method, a video playing device, video playing equipment and a storage medium. The embodiment of the invention can acquire the target video; acquiring scene categories of the video frames according to the picture content of the video frames in the target video; combining video frames adjacent in playing sequence and same in scene category into a redundant video group; and when the target video is played to the redundant video group, playing the redundant video group according to a preset playing mode. According to the scheme, the redundant fragments in the video are automatically filtered, when a user watches the video, the playing mode can be set according to the requirement, so that the playing mode of the redundant fragments is controlled, the regulating and controlling efficiency of the played content is improved, and the regulation and control of the played content of the video are very convenient.

Description

Video playing method, device, equipment and storage medium

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a video playing method, apparatus, device, and storage medium.

Background

With the rapid development of networks, video resources have been explosively increased, and video contents relate to aspects such as life entertainment, education and work, and people watch various videos through devices such as televisions, tablet computers and mobile phones almost every day.

When watching the video, if the redundant segments which are repeated for the content such as scenery in the video are not interested, the progress bar of the video can be dragged to skip the segments which are not interested, and only the scenario is watched.

In the research and practice of the prior art, the inventor of the present invention found that, due to unfamiliar reasons such as video content, a user drags a progress bar based on his experience to adjust the playing of the video, too much or too little content may be skipped at a time, and multiple repeated adjustments are required to adjust to the content that he wants to watch. Therefore, the existing video playing mode consumes a great deal of effort and time of a user to screen out the content to be watched, the adjustment efficiency is low, and the adjustment of the played content is very inconvenient.

Disclosure of Invention

The embodiment of the invention provides a video playing method, a device, equipment and a storage medium, which aim to automatically filter redundant fragments in a video and solve the technical problems of low playing content regulation and control efficiency and inconvenience.

The embodiment of the invention provides a video playing method, which comprises the following steps:

acquiring a target video;

acquiring scene categories of the video frames according to the picture content of the video frames in the target video;

Combining video frames adjacent in playing sequence and same in scene category into a redundant video group;

and when the target video is played to the redundant video group, playing the redundant video group according to a preset playing mode.

In some embodiments, the merging video frames with adjacent play orders and same scene categories into a redundant video group includes: determining a characteristic video frame in the redundant video group;

the playing the redundant video group according to the preset playing mode includes: the feature video frames of each redundant video group are played sequentially.

In some embodiments, the merging the video frames with adjacent playing orders and same scene categories into a redundant video group further includes:

and identifying the redundant video group by using a preset redundant mark in the playing page of the target video.

In some embodiments, the identifying the redundant video group in the playing page of the target video by using a preset redundant mark includes:

determining the playing time of the redundant video group;

determining a mark position in a playing page of the target video according to the playing time;

and displaying a preset redundant mark at the mark position so as to identify the redundant video group.

In some embodiments, the redundant mark includes a folding mark, and determining a mark position in a playing page of the target video is determining a playing interval of the redundant video group on a playing progress bar in the playing page;

and displaying a preset redundant mark at the mark position to fold the playing interval on the playing progress bar.

In some embodiments, the playing the redundant video group according to a preset playing mode includes:

and if an unfolding instruction for the folding mark of the next redundant video group is received before the key frame of the next redundant video group is played, unfolding the playing interval of the next redundant video group on the playing progress bar, and sequentially playing each frame in the next redundant video group.

In some embodiments, the preset redundant markers include scene tags and or the duration of the redundant video group expanded play.

The embodiment of the invention also provides a video playing device, which comprises:

the acquisition unit is used for acquiring the target video;

the identification unit is used for acquiring scene categories of the video frames according to the picture content of the video frames in the target video;

the merging unit is used for merging video frames with adjacent playing sequences and same scene categories into a redundant video group;

And the playing unit is used for playing the redundant video group according to a preset playing mode when the target video is played to the redundant video group.

The embodiment of the invention also provides video playing equipment, which comprises: a display screen, a processor, a memory, and a video playback program stored on the memory and executable on the processor, wherein:

the display screen is used for displaying target videos:

the video playing program, when executed by the processor, implements the steps of any video playing method provided by the embodiments of the present invention.

The embodiment of the invention also provides a storage medium, which stores a plurality of instructions, wherein the instructions are suitable for being loaded by a processor to execute the steps in any video playing method provided by the embodiment of the invention.

As can be seen from the above, the embodiment of the invention acquires the target video, and then identifies the scene category of the video frame according to the picture content of the video frame in the target video, thereby realizing the analysis and identification of the picture content to judge whether the content of the video frame is repeatedly displayed in redundancy; and combining the video frames with adjacent playing orders and same scene categories into a redundant video group, and playing the redundant video group according to a preset playing mode when the target video is played to the redundant video group. According to the scheme, scene types of all video frames in the target video are obtained by automatically analyzing picture contents of the video frames, and video frames which are adjacent and have the same scene types are combined into redundant video groups, namely video fragments with repeated contents can be obtained, and further, when the video is played, how to play the screened redundant video groups can be controlled according to a play mode set by a user, such as skipping or normal play. According to the scheme, the redundant fragments in the video are automatically filtered, when a user watches the video, the playing mode can be set according to the requirement, the playing mode of the redundant fragments is controlled, the video contents are selectively watched or filtered, the playing contents are regulated and controlled without repeatedly dragging the progress bar, the regulating and controlling efficiency of the playing contents is improved, the regulation and control of the video playing contents is very convenient, and the user experience is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1a is a schematic view of a scenario of an information interaction system according to an embodiment of the present invention;

fig. 1b is a schematic flow chart of a video playing method according to an embodiment of the present invention;

fig. 2a is another flow chart of a video playing method according to an embodiment of the present invention;

FIG. 2b is a schematic diagram of a playback page according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a video playing scene provided by an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a video playing device according to an embodiment of the present invention;

fig. 5 is a schematic diagram of another structure of a video playing device according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a terminal according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a video playing device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

The embodiment of the invention provides a video playing method, a video playing device, video playing equipment and a storage medium.

Referring to fig. 1a, an embodiment of the present invention provides an information interaction system, including: a terminal 10 and a server 20, the terminal 10 and the server 20 being connected via a network 30. The network 30 includes network entities such as routers and gateways, which are not shown in the figure. The terminal 10 may interact with the server 20 through a wired network or a wireless network, for example, may download video from the server 20. The terminal 10 may be a mobile phone, a tablet computer, a notebook computer, or the like, and fig. 1a illustrates the terminal 10 as a mobile phone. The terminal 10 may have installed therein various applications required by users, such as video applications and the like.

Based on the system shown in fig. 1a, the terminal 10 may download video applications and/or video application update data packets and/or data information or service information (e.g., video information) related to the video applications from the server 20 via the network 30 as required. With the embodiment of the present invention, the terminal 10 may acquire the target video from the server 20 through the network; then, the terminal 10 obtains scene types of the video frames according to the picture content of the video frames in the target video, and combines the video frames with adjacent playing orders and the same scene types into a redundant video group; when the target video is played to the redundant video group, the terminal 10 plays the redundant video group according to a preset play mode. Therefore, the redundant segments in the video are automatically filtered out, when a user watches the video, the playing mode can be set according to the requirement so as to control the playing mode of the redundant segments, the video contents are selectively watched or filtered, the playing contents are regulated and controlled without repeatedly dragging the progress bar, the regulating and controlling efficiency of the playing contents is improved, the regulation and control of the video playing contents is very convenient, and the user experience is improved.

The example of fig. 1a is merely an example of a system architecture for implementing an embodiment of the present invention, and the embodiment of the present invention is not limited to the system architecture shown in fig. 1a, and various embodiments of the present invention are proposed based on the system architecture.

In this embodiment, description will be made from the viewpoint of a video playback apparatus which can be integrated in a terminal, a tablet computer, a personal computer, or the like.

The embodiment of the invention provides a video playing method, which comprises the following steps: acquiring a target video; acquiring scene categories of video frames according to picture contents of the video frames in the target video; combining video frames adjacent in playing sequence and same in scene category into a redundant video group; and when the target video is played to the redundant video group, playing the redundant video group according to a preset playing mode.

As shown in fig. 1b, the video playing method may be executed when and during the video playing process, and the specific flow may be as follows:

101. and acquiring a target video.

The target video may be a video stored locally in the device, or may be a network video loaded from a network, or the like.

For example, the video playback apparatus locally loads the target video from the device when starting to play the video.

For another example, when receiving a video playing instruction or during the video playing process, the video playing device loads the target video file in the streaming media format through the network, thereby realizing downloading and playing.

Of course, the video playing device may trigger redundant identification operation after detecting the target video stored in the local device, load the target video from the local device, and identify the redundant video segment.

102. And acquiring scene categories of the video frames according to the picture content of the video frames in the target video.

Wherein, the video frame is a piece of still image, and a plurality of still images form a video. The target video is composed of a plurality of video frames arranged in the play order.

The video playing device respectively and individually identifies the picture content of each video frame in the target video, and obtains the scene category of each video frame. Such as scenery, subtitles or cut-in advertisements.

In some embodiments, to more accurately identify the scene category of the video frame, step 102 may include: inputting the target video into a preset classification model to trigger the classification model to classify the video frames according to the picture content of the video frames in the target video; and obtaining a classification result output by the classification model to obtain scene categories of the video frames.

The preset classification model can be obtained through training of a large number of training videos and corresponding scene categories. The video playing device sequentially inputs video frames in the target video into a preset classification model according to the playing sequence to classify, or inputs the target video into the preset classification model, and the classification model classifies each video frame in the target video according to the playing sequence.

The classification model analyzes and obtains scene categories of the video frames, such as scenery, captions, inserted advertisements or normal scenario, and the like, according to scene classification rules and picture contents in the video frames. Then, the classification model outputs the scene category of the video frame as a classification result to the video playback device.

The video playing device receives the classification result output by the classification model and analyzes the classification result to obtain the scene category of the video frame.

In some embodiments, to improve the efficiency of video frame classification, the number of preset classification models is multiple, and the step of inputting the preset classification model into the target video may include: dividing the target video into segments to be analyzed according to preset duration; and respectively inputting the fragments to be analyzed into preset classification models.

For example, the target video is in a streaming media text format, the video playing device segments the target video according to a preset time length, for example, 1 minute, and segments the target video into segments to be analyzed with a preset time length.

The video playing device inputs the cut fragments to be analyzed into different classification models for classification. Therefore, parallel processing of video frame scene category identification is realized, and analysis efficiency is greatly improved.

For example, 3 classification models are preset, and after the video playing device segments to obtain a first segment to be analyzed, the first segment to be analyzed is input into the first classification model for classification; after the second fragment to be analyzed is obtained through segmentation, inputting the second fragment to be analyzed into a second classification model for classification; after the third segment to be analyzed is obtained through segmentation, inputting the third segment to be analyzed into a third classification model for classification; after the fourth to-be-analyzed fragment is obtained through segmentation, the fourth to-be-analyzed fragment is input into the first classification model for classification, or classification results are output and classified in the currently idle classification model, and the like, so that parallel processing of a plurality of to-be-analyzed fragments is realized.

If the target video is stored locally, the video playing device can cut the target video into a plurality of segments with preset duration according to the playing sequence, and respectively input a preset classification model for classification.

In some embodiments, before the step of inputting the target video into the preset classification model, the preset training video and the corresponding scene category information are input into the preset machine learning model for training, so as to obtain the classification model.

For example, the preset machine learning model may be a TensorFlow artificial intelligence learning system. The preset training video may include video of various movie categories such as movies, television shows, sports games, and animations. The method can be used for classifying the scenes of the video frames in the training video in advance, configuring the corresponding scene categories, wherein the scene categories of the video frames in the training video are the scene category information corresponding to the training video.

The video playing device inputs the training video and the corresponding scene category information into a preset machine learning model, so that the machine learning model automatically learns and trains scene classification rules according to the picture content of video frames in the training video, and the input other videos can be classified.

In some embodiments, the video playing device may segment the training video into training segments with preset durations, and input the training segments into the machine learning model for training. The preset time length can be 1 minute and the like, and the configuration can be flexibly carried out according to actual needs.

Further, when model training is performed, training videos of different film and television categories and corresponding scene category information can be respectively input into a preset machine learning model for training, and a classification model corresponding to each film and television category is obtained. The video category may be classified according to the overall scenario content, video format, etc., such as movies, television shows, cartoons, documentaries, entertainment, finance, military, law, science and technology, or travel.

Therefore, the classification models of the video categories can be respectively trained, so that the video frames of the videos of different video categories can be classified by using the corresponding classification models in a targeted manner, and the accuracy of identifying the scene categories of the video frames is improved.

Correspondingly, the step of inputting the target video into the preset classification model includes: acquiring the video category of a target video; and inputting the target video into a classification model of the same video category according to the video category of the target video.

The film and television category of the target video can be obtained from attribute information of the target video, such as a movie, a television show, a sports match, an animation, and the like. The video playing device finds out a classification model identical to the video category of the target video in a plurality of preset classification models according to the video category of the target video, and then inputs the target video into the classification model identical to the video category. For example, if the video category of the target video is a movie, inputting the target video into a classification model of which the video category is a movie; if the video category of the target video is documentary, inputting the target video into a classification model with the video category of documentary.

103. And merging the video frames with adjacent playing sequences and same scene categories into a redundant video group.

And judging that the two video frames are adjacent in play sequence according to the play sequence if the video frame number of the interval between the two video frames does not exceed a preset value. The preset value can be 0, 1 or 2, and the like, and is flexibly configured according to actual needs, so that the method has certain fault tolerance and compatibility.

After the video playing device identifies the scene types of the video frames, the video frames with adjacent playing orders and the same scene types are combined into a redundant video group, and the obtained redundant video group is the redundant segment in the target video, and is repeated in content and possibly needs to be skipped.

In some embodiments, step 103 may comprise: identifying redundant video frames according to scene categories of the video frames; and merging the redundant video frames which are adjacent in play order and have the same scene category into a redundant video group.

For example, the video playing device determines whether the scene category of the video frame belongs to the redundancy category according to the preset classification information. The preset classification information records scene categories contained in the redundant categories, such as grasslands, trees, gardens, subtitles, and inserted advertisements. If the scene category of the video frame belongs to the redundancy category, the video playing device determines that the video frame is a redundancy video frame; if the scene category of the video frame does not belong to the redundancy category, the video playing device determines that the video frame is a non-redundancy video frame.

Then, the video playing device merges the redundant video frames with adjacent playing orders and same scene categories into a redundant video group. The redundant video group is a segment with similar or same picture content, so that the integration of the similar picture content is realized, and the user can conveniently regulate and control the playing of the video. The video playing device only merges video frames with scene categories belonging to redundancy categories, and can realize the merging of redundancy fragments of specific scenes, such as redundancy fragments with no pushing effect on video drama, such as scenery and the like; and, merging video frames that are identical in scene but not desired to be merged, such as a person conversation, etc., is avoided.

In some embodiments, after the step of merging the video frames with adjacent play orders and same scene categories into the redundant video group, the method may further include: configuring scene categories of video frames in the redundant video group as scene categories of the redundant video group; and combining the redundant video groups which are adjacent in play sequence and have scene categories belonging to the same main category into a group.

And judging that the play orders of the two redundant video groups are adjacent if the number of video frames at the interval between the two redundant video groups does not exceed a preset value according to the play order. The preset value can be 0, 10 or 20, and the like, and is flexibly configured according to actual needs, so that certain fault tolerance and compatibility are realized.

The video playing device configures the scene category of the video frame in the redundant video group as the scene category of the redundant video group. The scene categories belonging to the same main category means that the scene categories are the same, or in a preset scene model tree, the two scene categories are correspondingly subordinate to the same main category as sub-categories. The preset scene model tree comprises a plurality of main classes, and each main class comprises one or more sub-classes similar to the scene, namely the scene class. For example, the main class is landscape, and the sub-classes included in the main class can be grasslands, rivers, mountains, forests, flowers and seas and other scene classes. If the scene categories of two adjacent redundant video groups in the playing sequence are the grassland and the river respectively, the video playing device can determine that the grassland and the river are subordinate to the main category of the landscape according to a preset scene model tree, namely, the scene categories of the two redundant video groups are subordinate to the same main category.

Therefore, the video playing device combines the redundant video groups which are adjacent in playing sequence and belong to the same main class into a group, namely, the redundant video fragments with the same or similar scenes are combined, and the influence of finer granularity of the redundant video groups on user experience is avoided.

It should be noted that if the playing sequence of the redundant video set is located at the starting position of the target video, the redundant video set is directly played, and the redundant video set is prohibited from being combined.

104. And when the target video is played to the redundant video group, playing the redundant video group according to a preset playing mode.

The preset play mode may include a thumbnail mode, a sequential mode, and the like. The thumbnail mode refers to skipping over the redundant video group or playing a preset number of video frames in the redundant video group for a user to browse quickly; the sequential mode refers to normal play of video frames in the redundant video group according to the play order.

The video playing device plays the target video in the playing page, and a user can adjust the playing mode at any time in the playing process. For example, the user may adjust the play mode by switching the virtual key in the play page, or adjust the play mode by switching the key in the remote controller, etc.

If the playing mode is the sequential mode, the video playing device normally plays the video frames according to the playing sequence when playing the redundant video group.

If the playing mode is the thumbnail mode, when playing to the redundant video group, as an implementation manner, the video playing device skips the redundant video group and directly plays the video frames after the redundant video group. Of course, a prompt message may also be displayed before or after the skip to alert the user that the redundant video segments have been automatically skipped for them.

If the playing mode is the thumbnail mode, when playing to the redundant video group, as another implementation manner, the video playing device may further display several video frames in the redundant video group for the user to browse quickly, specifically, the step of merging the video frames with adjacent playing orders and the same scene category into the redundant video group further includes: the step of determining the characteristic video frames in the redundant video group, the step of playing the redundant video group according to a preset playing mode, may include: the feature video frames of each redundant video group are played sequentially.

Wherein, the characteristic video frame can be a video frame with the picture content representative in the redundant video group.

For example, the video playback device may take the first video frame and/or the last video frame in the redundant video group as the feature video frame; and capturing a video frame at intervals of a preset interval to serve as a characteristic video frame according to the preset interval.

As an embodiment, the step of "determining the feature video frames in the redundant video group" may include: acquiring the playing time length of the redundant video group and the frame frequency of the target video; calculating the position information of the characteristic video frame according to the playing time length and the frame frequency; and extracting the corresponding video frames from the redundant video group according to the position information to serve as characteristic video frames.

The video playback device may obtain the position of the video frame in the redundant video group in the target video, for example, the time at which the video frame was played. Therefore, the video playing device can calculate the playing time of the redundant video group according to the playing time of the first video frame and the playing time of the last video frame in the redundant video group. Of course, the video playing device may also obtain the frame frequency of the target video, and calculate the playing duration of the redundant video group according to the number of video frames and the frame frequency in the redundant video group. The frame rate of the target video refers to the number of frames displayed per second of the target video, and can be obtained from attribute information of the target video.

Then, the video playing device calculates the position of the characteristic video frame to be played according to the playing time length and the frame frequency. As an embodiment, the video playing device determines the number of video frames in the redundant video group according to the playing duration and the frame frequency, for example, multiplies the value of the playing duration and the frame frequency, and the obtained value is used as the value of the number of video frames in the redundant video group. Then, the video playing device divides the number value of the video frames in the redundant video group by the preset number value of the characteristic video frames, the obtained number value is the distribution interval of the characteristic video frames in the redundant video group, and then the video playing device determines the positions of the characteristic video frames in the redundant video group according to the distribution interval. Of course, the number of the feature video frames may be configured according to the playing duration of the redundant video group, for example, the number of the feature video frames is equal to the playing duration, or the playing duration is input into a preset calculation formula to be calculated, which may be specifically and flexibly configured according to actual needs.

For example, the playing duration is 50 seconds, the frame frequency is 30 frames/second, the video playing device determines that the video frames in the redundant video group have 1500 frames, if the number of preset feature video frames is 3, the distribution interval is 500, and the video playing device may select the 500 th frame, the 1000 th frame and the 1500 th frame of video frames as feature video frames, or select the 1 st frame, the 500 th frame and the 1000 th frame of video frames as feature video frames, or select the 1 st frame, the 750 th frame and the 1500 th frame of video frames as feature video frames. Thereby, the position information of the feature video frame is obtained.

After the position of the characteristic video frame is obtained, the video playing device extracts the corresponding video frame in the redundant video group according to the position information to serve as the characteristic video frame. And when the target video is played to the redundant video group, playing the characteristic video frames in the redundant video group.

Therefore, the characteristic video frames are more uniformly selected, and the whole content of the redundant video group can be represented. When the redundant video group is played in the thumbnail mode, the video playing device plays the screened characteristic video frames, so that a user can quickly know the content of the redundant video group, and the continuity of the scenario is not affected.

The method according to the previous embodiment will be described in further detail below taking the specific integration of the video playback device in the terminal as an example.

In an embodiment, a video playing method is provided, which is different from the above embodiment of the video playing method in that: in the embodiment, the terminal plays the target video and identifies the redundant segment, so that the redundant analysis of the online video is realized; and displaying a redundant mark in the playing page, so that a user can conveniently identify the redundant video clip, and further, the playing mode is determined. Referring to fig. 2a, the method comprises:

201. and acquiring a target video.

The terminal can receive the video playing instruction and acquire the target video indicated by the video playing instruction.

For example, when a user uses a terminal to watch an online video, the user selects a video to be played, and inputs a video playing instruction on the terminal.

The terminal receives a video playing instruction input by a user, and acquires a target video indicated by the video playing instruction, namely, a video selected by the user.

202. And acquiring scene categories of the video frames according to the picture content of the video frames in the target video.

203. And merging the video frames with adjacent playing sequences and same scene categories into a redundant video group.

For example, the terminal obtains the target video in the streaming media format through the network, plays the target video, and in the playing process, the terminal analyzes the picture content of the video frame of the target video in the streaming media format, and obtains the scene category of each video frame.

For example, after acquiring a video frame of a target video, the terminal first acquires a scene category of the video frame according to a picture content of the video frame.

If the scene category of a video frame is the same as that of a video frame adjacent to the playing sequence, merging the video frames adjacent to the playing sequence and having the same scene category into a redundant video group; if the scene category of a video frame is different from that of a video frame adjacent to the playing sequence, the video frame is normally played according to the playing sequence, and the picture content of other video frames is continuously analyzed while the video frames are played, and the video frames adjacent to the playing sequence and having the same scene category are combined.

204. And identifying the redundant video group by using a preset redundant mark in the playing page of the target video.

The preset redundant mark may be a preset special symbol and/or attribute information of a redundant video group, etc. For example, the redundant marks include folding marks such as a plus sign "+", and of course, information such as scene categories of the redundant video group and/or time durations of the redundant video group may also be included.

The terminal obtains the position of the video frame in the redundant video group in the target video, for example, the playing time of the video frame. Thus, the terminal may determine the start time of the redundant video group in the target video, for example, using the play time of the first video frame in the redundant video group as the start time of the play of the redundant video group.

Information such as the name and the playing progress bar of the target video is displayed in the playing page of the target video, the terminal finds the starting time position of the playing of the redundant video group in the playing progress bar of the target video, and then a redundant mark is displayed in the starting time position to identify that the redundant video group starts from the starting time position. Of course, the terminal may also obtain the end time of playing the redundant video set according to the playing time of the last video frame in the redundant video set, so as to display the redundant mark at the start time position and the end time position of the redundant video set.

In some embodiments, the step of identifying the redundant video group in the play page of the target video using the preset redundant mark may include: "determining the play time of the redundant video group; determining a mark position in a playing page of the target video according to the playing time; and displaying a preset redundant mark at the mark position to identify the redundant video group. "

The playing time of the redundant video group comprises a starting time and/or an ending time of playing the redundant video group, and specifically can be determined according to the playing time of the first frame video frame and the last frame video frame in the target video respectively, wherein the terminal takes the playing time of the first frame video frame as the starting time of playing the redundant video group, and the playing time of the last frame video frame as the ending time of playing the redundant video group. For example, the start time of the redundant video set is 5 th minute, the end time is 10 th minute, and the play time is 5 th to 10 th minutes.

Then, the terminal determines a mark position for displaying the redundant mark in a play page of the target video according to the play time. In one embodiment, the terminal may determine a start position and/or an end position of playing the redundant video group according to the playing time of the playing page in a playing progress bar of the playing page, and then determine a mark position of the redundant mark according to the start position and/or the end position and a preset mark region rule. For example, if the preset marking area rule is that the preset distance is below the starting position, the terminal takes the position of the preset distance below the starting position of the redundant video group in the progress bar as the marking position; and the preset marking area rule is a position below the progress bar and between the starting position and the ending position, and the terminal takes the position below the progress bar and between the starting position and the ending position of the redundant video group as a marking position.

Thereby, the terminal obtains the mark position where the redundant mark is displayed.

And then, the terminal displays a preset redundant mark at the mark position, so that the redundant video group is identified, a user can conveniently know that the fragments in the target video are redundant fragments, and the positions of the redundant fragments are determined.

In some embodiments, the redundant mark comprises a folding mark for identifying a folded redundant video group, prompting a user for a folded redundant video segment, and the step of determining a mark position in a playing page of the target video may comprise: "determine the play interval of the redundant video group on the play progress bar in the play page"; correspondingly, the step of displaying the preset redundant mark at the mark position includes: "fold play section on play progress bar". "

After the play time of the redundant video group is obtained, the terminal finds the start position and the end position of the play of the redundant video group in the play progress bar of the play page according to the play time, and takes the progress bar between the start position and the end position as a play interval.

Then, the terminal takes the playing interval as a marking position, folds the playing interval on the playing progress bar, and realizes folding of the redundant video group on the playing progress bar, so that a user can more intuitively feel the identification effect of the redundant video piece, and the folding mark in the redundant mark can prompt the user to notice the folded redundant piece, thereby improving the watching experience of the user.

In some embodiments, if the preset redundant flag includes a scene category, the step of identifying the redundant video group using the preset redundant flag may further include: configuring a scene tag for the redundant video group according to the scene category of the video frame in the redundant video group; the scene tags of the redundant video groups are configured into redundant tags.

If the video scene categories contained in the redundant video group are the same, the terminal configures the scene category of the video frame as the scene tag of the redundant video group; or according to a preset scene model tree, acquiring a main class subordinate to the scene class of the video frame, and taking the main class as a scene tag of the redundant video group.

If the video frame scene categories contained in the redundant video group are different and belong to the same main category, the terminal determines the main category as the scene tag of the redundant video group. Of course, the main class may be configured as a main tag, statistics is performed on scene types of video frames in the redundant video group, the scene types obtained by statistics are used as sub-tags, and then the main tag and the sub-tags are configured as scene tags of the redundant video group together.

Then, the scene tag of the terminal redundant video group is configured into the redundant flag.

And correspondingly, displaying the redundant mark of the redundant video group at the mark position in the playing page of the target video. The scene labels in the redundant marks comprise main labels and sub labels, so that a user can conveniently know specific picture scenes in the redundant video group.

In some embodiments, the preset redundant mark includes a duration of the expanded play of the redundant video group, and the step of identifying the redundant video group using the preset redundant mark may further include: and acquiring the time length of the expanded and played redundant video group, and configuring the time length of the expanded and played redundant video group into the redundant mark.

The terminal can calculate and obtain the time length of the expansion and the playing of the redundant video group according to the playing time of the first video frame and the playing time of the last video frame in the redundant video group. Of course, the terminal can also obtain the frame frequency of the target video, and calculate the duration of the expansion and playing of the redundant video group according to the number of video frames and the frame frequency in the redundant video group.

And then, the terminal configures the time length of the expanded and played redundant video group into the redundant marks thereof, and correspondingly, in the playing page of the target video, the redundant marks for identifying the redundant video group comprise the time length of the expanded and played redundant video group, so as to help a user to decide whether to play the redundant video group.

It should be noted that, because the redundant marks may include personalized identification information such as a folding mark, a scene tag, and/or a duration of the expanded playing of the redundant video set, the redundant marks corresponding to different redundant video sets may be different, and specifically configured according to the actual situation of the redundant video set.

205. And when the target video is played to the redundant video group, playing the redundant video group according to a preset playing mode.

For example, after the terminal identifies the redundant video group in the play page of the target video by using the preset redundant mark, the step of playing the redundant video group according to the preset play mode may include: and if an unfolding instruction for the folding mark of the next redundant video group is received before the key frame of the next redundant video group is played, unfolding a playing interval of the next redundant video group on a playing progress bar, and sequentially playing each frame in the next redundant video group.

In the video playing process, the user can adjust the video playing mode, and as an implementation manner, before playing to the next redundant video set, the user can click the folding mark of the next redundant video set in the playing page and input an unfolding instruction.

For example, referring to fig. 2b, the current play mode is the thumbnail mode, and the target video is full length for 60 minutes. Displaying redundancy marks of three redundancy video groups under a playing progress bar of a target video in a playing page, wherein the redundancy mark of a first redundancy video group is positioned under a starting position of playing the redundancy mark, and the redundancy mark comprises a folding mark "+" sign and a playing duration of the first redundancy video group of "5 minutes"; the redundant mark of the second redundant video group is positioned below the starting position of playing, and comprises a folding mark "+" sign and the playing time length of the second redundant video group of 1 minute; the redundant mark of the third redundant video group is positioned below the starting position of playing, and comprises a folding mark "+" sign and the playing time length of the third redundant video group of 2 minutes.

The user can click the folding mark "+" of the first redundant video group before playing the first redundant video group in the playing process of the target video, and the terminal determines that the unfolding instruction input by the user on the folding mark is received.

The terminal receives an unfolding instruction input by a user on the folding mark, determines that the user needs to sequentially play the next redundant video group, switches the playing mode to sequential playing, unfolds the playing interval of the next redundant video group in the playing progress bar, and then plays the video frames of each frame according to the playing sequence when the next redundant video group is played.

In some embodiments, before playing to the next redundant video set, if the user does not input the expansion instruction to the redundancy flag of the next redundant video set, the terminal determines that the expansion instruction is not received, plays the next redundant video set according to the thumbnail mode, for example, skips the next redundant video set or plays only the feature video frames therein, and so on.

Therefore, the real-time flexible switching of the playing modes is realized.

In some embodiments, when an expansion instruction input to the redundancy flag is received, a feature video frame of the redundancy video group may also be displayed for the user to preview the video content.

As can be seen from the above, the embodiment of the present invention receives the video playing instruction and obtains the target video indicated by the video playing instruction. In the playing process, the scene category of the video frame is identified according to the picture content of the video frame in the target video, so that the analysis and identification of the picture content are realized, and whether the content of the video frame is repeatedly displayed or not is judged; then, combining video frames with adjacent playing orders and same scene categories into a redundant video group, and marking the redundant video group by using a preset redundant mark in a playing page of the target video, so that a user can conveniently distinguish redundant fragments; and when the target video is played to the redundant video group, playing the redundant video group according to a preset playing mode. The scheme realizes the simultaneous downloading, playing and redundancy analysis of videos, obtains scene types of each video frame in the target video by automatically analyzing the picture content of the video frame in the playing process of the target video, combines the video frames with the same scene types into a redundancy video group, namely, can obtain video fragments with repeated content, marks the redundancy video group by using redundancy marks in the playing page, enables a user to intuitively feel the redundancy analysis result, and further, can control how to play the screened redundancy video group, such as skip or normal play, according to the playing mode set by the user when playing the video. According to the scheme, the regulation and control efficiency of the played content is improved, the regulation and control of the video played content is very convenient, and the user experience is improved.

The method described in the previous examples is described in further detail below by way of example.

For example, referring to fig. 3, in this embodiment, the video playing apparatus will be specifically integrated into a network device, which may be a terminal, a tablet computer, or a personal computer. In addition, the network device at least comprises an acquisition module, a parser, a decoder, a YUV (luminance and chrominance) module, a scaling module and a display module; the video playing device comprises an intelligent analysis module, a category management module and a playing control module, and is specifically as follows:

an acquisition module;

the acquisition module determines a target video indicated by the video playing instruction and a corresponding video source according to the video playing instruction input by the user.

Then, the acquisition module correspondingly loads a streaming media file compression packet of the target video through the network to acquire the target video.

(II) a parser;

because the video streaming media file obtained by loading is a packed compressed package, the parser correspondingly decompresses the video streaming media file according to the packing mode of the video source.

(III) a decoder;

the decoder decodes the video streaming media file, converts the video streaming media file into a video streaming file of YUV color space, and then inputs the video streaming file to the YUV module. Wherein, the video stream file contains the video frame of the target video.

(IV) a YUV module;

and after the YUV module obtains the video stream file of the target video, the YUV module sends the video stream file to the intelligent analysis module for redundancy analysis.

(V) an intelligent analysis module;

after the intelligent analysis module receives the video stream file sent by the YUV module, the video stream file can be segmented into fragments to be analyzed according to preset time length, for example, 1 minute, and each fragment to be analyzed is respectively input into each preset classification model.

The classification model classifies each video frame in the fragment to be analyzed, and inputs a classification result which contains scene categories of the video frames.

Then, the intelligent analysis module respectively combines the video frames with adjacent playing orders and the same scene category in each fragment to be analyzed into redundant video groups, such as G0, G1, G2 … … and Gn.

The intelligent analysis module configures the scene category of the video frame in the redundant video group as the scene category corresponding to the redundant video group, for example, the scene category corresponding to G0 is T0, the scene category corresponding to G1 is T1, the scene category corresponding to G2 is T2, … … and the scene category corresponding to Gn are Tn.

And then, the intelligent analysis model sends each redundant video group and the corresponding scene category to the category management module. And the intelligent analysis model directly sends the uncombined video frames to the play control module.

It should be noted that, if the redundant video set is the first segment of the target video to be played, the intelligent analysis model directly sends the video set to be redundant to the play control module.

A category management module;

and the category management module receives each redundant video group and the corresponding scene category sent by the intelligent analysis module.

And then, the category management module can combine the redundant video groups which are adjacent in play order and have the scene categories belonging to the same main category into a group to obtain the combined redundant video group. And the category management module configures the main category as scene tags of the merged redundant video group. For example, the category management module combines the redundant video groups with adjacent play orders and same scene categories into a group to obtain a combined redundant video group. And configuring the scene category of the video frames in the combined redundant video group into scene labels of the combined redundant video group.

If the redundant video set cannot be combined with other adjacent redundant videos, the scene category of the redundant video set is configured as the scene label.

For example, if the play order of the redundant video groups G1, G2, and G3 is close, the category management module detects whether the scene categories of G1 and G2 belong to the same main category. If the scene categories of the G1 and the G2 are the same, merging the G1 and the G2 into a new redundant video group, configuring the T1 or the T2 as the scene category and the scene label of the merged redundant video group, detecting whether the scene categories of the merged redundant video group and the redundant video group G3 with adjacent playing sequence are the same, and so on. If the scene categories of G1 and G2 are different, but the scene categories of G1 and G2 belong to the main category E1, combining G1 and G2 into a new redundant video group, configuring E1 as the scene category and the scene tag of the combined redundant video group, and further configuring E1, T1 and T2 as the scene category and the scene tag of the combined redundant video group.

If the scene categories of G1 and G2 are different and the scene categories of G1 and G2 do not belong to the same main category, then continuing to detect whether the scene categories of G2 and the redundant video group G3 with adjacent playing order belong to the same main category, and so on.

Therefore, the combination of the redundant video groups and the configuration of the scene tags are realized.

And then, the category management module sends the uncombined redundant video group, the combined redundant video group and the respective scene labels to the intelligent analysis module.

And the intelligent analysis module sends the uncombined redundant video group, the combined redundant video group and the respective scene labels to the play control module.

(seventh) a play control module;

and after receiving the unmixed video frames, the unmixed redundant video groups and the combined redundant video groups, the play control module determines the next played video frame or the next played redundant video group according to the current playing time.

And, the play control module obtains the current play mode, and configures a play identifier, such as a sequential play identifier or a thumbnail play identifier, according to the play mode.

If the next video frame is not combined, the play control module sends the play identification and the not combined video frame to the scaling module as play data. Wherein, the playing identifier is a sequential playing identifier.

If the next played redundant video group is, the playing module configures the scene tag of the next played redundant video group into the corresponding redundant mark, acquires the playing time length of the next played redundant video group, and configures the playing time length into the corresponding redundant mark. Of course, the play control module may also configure the fold identification into the redundant indicia. And then, sending the playing identifier, the next played redundant video group and the redundant mark to the scaling module as playing data. The playing identifier is a sequential playing identifier or a thumbnail playing identifier.

(eight) a scaling module;

and after receiving the playing data, the scaling module scales the video frames in the playing data according to the size of the display screen.

If the video frames in the playing data are the non-combined video frames, the playing identification and the scaled non-combined video frames are directly sent to the display module.

And if the video frame in the playing data is the redundant video group, sending the playing identification, the scaled redundant video group and the corresponding redundant mark to the display module.

And (nine) a display module.

The display module may use OpenGL (Open Graphics Library ) for display, among other things.

When receiving the uncombined video frame, the playing identifier is a sequential playing identifier, and the display module directly plays the uncombined video frame.

And when the redundant video group and the redundant mark are received, the display module displays the redundant mark corresponding to the redundant video group in the playing page. If the redundant mark comprises a folding mark, the playing time of the redundant video group is obtained, the playing interval of the redundant video group is determined on the playing progress bar in the playing page, and the playing interval is folded on the playing progress bar. And if the playing identifier is the sequential playing identifier, playing the video frames in the redundant video group according to the playing sequence when playing the redundant video group. If the playing identifier is the thumbnail mode identifier, the redundant video group is directly skipped or the characteristic video frame of the redundant video group is obtained when the redundant video group is played, and only the characteristic video frame is played.

In addition, the user can adjust the playing mode at any time in the video playing process, for example, an unfolding instruction is input to the displayed folding mark to adjust the playing mode into a sequential mode, then the playing control module configures a corresponding playing identifier according to the playing mode determined by the user, sends the playing identifier to the scaling module, and then the scaling module sends the playing identifier to the display module to update the existing playing identifier, so that the playing of the video is controlled according to the playing mode adjusted by the user.

As can be seen from the above, in the embodiment of the present invention, the redundant video groups are screened out by automatically analyzing the picture content of the video frame, so as to obtain video segments with repeated content, and further, when playing video, how to play the screened redundant video groups can be controlled, for example, skipped or played, according to the play mode set by the user. When watching video, a user can set a play mode according to the need without interrupting play, so as to control the play mode of the redundant segments, selectively watch or filter the video contents, and repeatedly drag a progress bar to regulate and control the play content, so that the regulation and control of the video play content are very convenient, and the user experience is improved.

In order to better implement the above method, the embodiment of the present invention further provides a video playing device, where the video playing device may be integrated in a network device, such as a terminal or a personal computer, where the terminal may include a device such as a mobile phone, a tablet computer, or a notebook computer.

For example, as shown in fig. 4, the video playback apparatus may include an acquisition unit 401, an identification unit 402, a merging unit 403, and a playback unit 404, as follows:

(1) An acquisition unit 401;

an acquisition unit 401 is configured to acquire a target video.

For example, the acquisition unit 401 locally loads a target video from the device when starting playing the video.

For another example, when receiving a video playing instruction or during the video playing process, the obtaining unit 401 loads the target video file in the streaming media format through the network, thereby realizing downloading and playing.

Of course, the obtaining unit 401 may also trigger the redundant identification operation after detecting the target video stored in the local device, and load the target video from the local device to identify the redundant video segment.

(2) An identification unit 402;

the identifying unit 402 is configured to obtain a scene category of a video frame according to a picture content of the video frame in the target video.

The identifying unit 402 identifies the picture content of each video frame in the target video one by one, respectively, and determines whether or not the content is redundant. Such as scenery, subtitles or cut-in advertisements.

In some embodiments, to more accurately identify the scene category of the video frame, the identification unit 402 includes a classification subunit and a category subunit: the classifying subunit is used for inputting the target video into a preset classifying model so as to trigger the classifying model to classify the video frames according to the picture content of the video frames in the target video; and the sub-unit is used for acquiring the classification result output by the classification model to obtain the scene category of the video frame.

The preset classification model can be obtained through training of a large number of training videos and corresponding scene categories. The classifying subunit sequentially inputs the video frames in the target video into a preset classifying model according to the playing sequence, or inputs the target video into the preset classifying model, and the classifying model classifies each video frame in the target video according to the playing sequence.

The classification model analyzes and obtains scene categories of the video frames, such as scenery, captions, inserted advertisements or normal scenario, and the like, according to scene classification rules and picture contents in the video frames. The classification model then outputs the scene category of the video frame as a classification result to the category subunit.

And the category subunit receives the classification result output by the classification model and analyzes the classification result to obtain the scene category of the video frame.

In some embodiments, in order to improve the efficiency of video frame classification, there are a plurality of preset classification models, and the classification subunit is specifically configured to: dividing the target video into segments to be analyzed according to preset duration; and respectively inputting the fragments to be analyzed into preset classification models.

For example, the target video is in a streaming media text format, the classifying subunit target video is segmented according to a preset time length, for example, 1 minute, and the target video is segmented into a segment to be analyzed with a preset time length.

The method comprises the steps that a plurality of preset classification models are provided, and the classification subunits respectively input cut fragments to be analyzed into different classification models for classification. Therefore, the parallel processing of video frame identification is realized, and the analysis efficiency is greatly improved.

For example, 3 classification models are preset, and after the classification subunit segments to obtain a first segment to be analyzed, the first segment to be analyzed is input into the first classification model for classification; after the second fragment to be analyzed is obtained through segmentation, inputting the second fragment to be analyzed into a second classification model for classification; after the third segment to be analyzed is obtained through segmentation, inputting the third segment to be analyzed into a third classification model for classification; after the fourth to-be-analyzed fragment is obtained through segmentation, the fourth to-be-analyzed fragment is input into the first classification model for classification, or classification results are output and classified in the currently idle classification model, and the like, so that parallel processing of a plurality of to-be-analyzed fragments is realized.

If the target video is stored locally, the classifying subunit may also cut the target video into a plurality of segments with preset duration according to the playing sequence, and input preset classifying models to classify the target video respectively.

In some embodiments, considering that the content of the video frames in the videos of different video categories is greatly different, in order to improve the accuracy of identifying the video field category, the classification subunit is specifically configured to: acquiring the video category of a target video; and inputting the target video into a classification model of the same video category according to the video category of the target video.

Therefore, the method and the device realize the targeted identification of the video frame content of videos of different film and television categories, and the identification and judgment of the video frame content are finer and more accurate and are closer to the actual scene demand.

(3) A merging unit 403;

and the merging unit 403 is configured to merge video frames with adjacent playing orders and same scene categories into a redundant video group.

If the number of video frames in the interval between two video frames does not exceed the preset value according to the play order, the merging unit 403 may determine that the play orders of the two video frames are adjacent. The preset value can be 0, 1 or 2, and the like, and is flexibly configured according to actual needs, so that the method has certain fault tolerance and compatibility.

After identifying the scene types of the video frames, the merging unit 403 merges the video frames with adjacent playing orders and the same scene types into a redundant video group, where the obtained redundant video group is a redundant segment in the target video, and the redundant video group is repeated in content and may need to be skipped.

In some embodiments, the merging unit 403 is specifically configured to: identifying redundant video frames according to scene categories of the video frames; and merging the redundant video frames which are adjacent in play order and have the same scene category into a redundant video group. For the specific implementation, reference may be made to the content described in step 103 in the above embodiment of the video playing method.

In some embodiments, the merging unit 403 is further configured to, after merging video frames that are adjacent in play order and have the same scene category into a redundant video group: configuring scene categories of video frames in the redundant video group as scene categories of the redundant video group; and combining the redundant video groups which are adjacent in play sequence and have scene categories belonging to the same main category into a group.

If the number of video frames spaced between the two redundant video sets does not exceed the preset value according to the play order, the merging unit 403 may determine that the play orders of the two redundant video sets are adjacent. The preset value can be 0, 10 or 20, and the like, and is flexibly configured according to actual needs, so that certain fault tolerance and compatibility are realized.

The merging unit 403 configures the scene category of the video frame in the redundant video group as the scene category of the redundant video group. The scene categories belonging to the same main category means that the scene categories are the same, or in a preset scene model tree, the two scene categories are correspondingly subordinate to the same main category as sub-categories. The preset scene model tree comprises a plurality of main classes, and each main class comprises one or more sub-classes similar to the scene, namely the scene class. For example, the main class is landscape, and the sub-classes included in the main class can be grasslands, rivers, mountains, forests, flowers and seas and other scene classes. If the scene categories of two adjacent redundant video groups in the play order are the grassland and the river respectively, the merging unit 403 may determine that the grassland and the river belong to the main category of the landscape according to the preset scene model tree, that is, the scene categories of the two redundant video groups belong to the same main category.

Therefore, the merging unit 403 merges the redundant video groups with adjacent playing orders and scene categories belonging to the same main category into a group, that is, merges the redundant video segments with the same or similar scenes, so as to avoid the influence of finer granularity of the redundant video groups on the user experience.

It should be noted that, if the playing sequence of the redundant video set is located at the starting position of the target video, the playing unit 404 directly plays the redundant video set, and prohibits the redundant video set from being combined.

(4) A playback unit 404;

and a playing unit 404, configured to play the redundant video set according to a preset playing mode when the target video is played to the redundant video set.

The playing unit 404 plays the target video in the playing page, and the user can adjust the playing mode at any time during the playing process. For example, the user may adjust the play mode by switching the virtual key in the play page, or adjust the play mode by switching the key in the remote controller, etc.

If the play mode is the sequential mode, when playing to the redundant video group, the play unit 404 plays the video frames in the sequential mode according to the play order.

If the play mode is the thumbnail mode, when playing to the redundant video set, as an embodiment, the play unit 404 skips the redundant video set and directly plays the video frames after the redundant video set. Of course, a prompt message may also be displayed before or after the skip to alert the user that the redundant video segments have been automatically skipped for them.

If the play mode is the thumbnail mode, when playing to the redundant video group, as another implementation manner, the play unit 404 may further display several video frames in the redundant video group for the user to browse quickly, the merging unit 403 is further configured to determine the feature video frames in the redundant video group, and the play unit 404 is specifically configured to: the feature video frames of each redundant video group are played sequentially.

For example, the merging unit 403 may take the first video frame and/or the last video frame in the redundant video group as the feature video frame; and capturing a video frame at intervals of a preset interval to serve as a characteristic video frame according to the preset interval.

As an embodiment, the merging unit 403 may specifically acquire a play duration of the redundant video group and a frame rate of the target video; calculating the position information of the characteristic video frame according to the playing time length and the frame frequency; and extracting the corresponding video frames from the redundant video group according to the position information to serve as characteristic video frames.

The merging unit 403 may acquire the position of the video frame in the redundant video group in the target video, for example, the time when the video frame is played. Thus, the merging unit 403 may calculate the playing duration of the redundant video group according to the playing time of the first video frame and the playing time of the last video frame in the redundant video group. Of course, the merging unit 403 may also obtain the frame rate of the target video, and calculate the playing duration of the redundant video group according to the number of video frames and the frame rate in the redundant video group. The frame rate of the target video refers to the number of frames displayed per second of the target video, and can be obtained from attribute information of the target video.

Then, the merging unit 403 calculates the position of the feature video frame to be played according to the play duration and the frame rate. In one embodiment, the merging unit 403 determines the number of video frames in the redundant video group according to the play duration and the frame frequency, for example, multiplies the number of play duration and the frame frequency, and the obtained number is used as the number of video frames in the redundant video group. Then, the merging unit 403 divides the number of video frames in the redundant video group by the preset number of feature video frames, and the obtained number is the distribution interval of the feature video frames in the redundant video group, and then, the merging unit 403 determines the position of the feature video frames in the redundant video group according to the distribution interval. Of course, the number of the feature video frames may be configured according to the playing duration of the redundant video group, for example, the number of the feature video frames is equal to the playing duration, or the playing duration is input into a preset calculation formula to be calculated, which may be specifically and flexibly configured according to actual needs.

After obtaining the positions of the feature video frames, the merging unit 403 extracts the corresponding video frames in the redundant video group as feature video frames according to the position information.

So that the playing unit 404 plays the characteristic video frames when the target video is played to the redundant video group.

Therefore, the characteristic video frames are more uniformly selected, and the whole content of the redundant video group can be represented. When the redundant video group is played in the thumbnail mode, the playing unit 404 plays the selected characteristic video frame, so that the user can quickly know the content of the redundant video group without affecting the continuity of the scenario.

As can be seen from the above, the obtaining unit 401 obtains the target video, and then the identifying unit 402 identifies the scene category of the video frame according to the picture content of the video frame in the target video, so as to implement analysis and identification of the picture content, so as to determine whether the content of the video frame is repeatedly displayed in redundancy; then, the merging unit 403 merges the video frames with adjacent playing orders and the same scene category into a redundant video group, and when the target video is played to the redundant video group, the playing unit 404 plays the redundant video group according to a preset playing mode. According to the scheme, scene types of all video frames in the target video are obtained by automatically analyzing picture contents of the video frames, and video frames which are adjacent and have the same scene types are combined into redundant video groups, namely video fragments with repeated contents can be obtained, and further, when the video is played, how to play the screened redundant video groups can be controlled according to a play mode set by a user, such as skipping or normal play. According to the scheme, the redundant fragments in the video are automatically filtered, when a user watches the video, the playing mode can be set according to the requirement, the playing mode of the redundant fragments is controlled, the video contents are selectively watched or filtered, the playing contents are regulated and controlled without repeatedly dragging the progress bar, the regulating and controlling efficiency of the playing contents is improved, the regulation and control of the video playing contents is very convenient, and the user experience is improved.

The apparatus according to the previous embodiments is described in further detail below.

In one embodiment, a video playing device is provided, which is different from the embodiment of the video playing method described above in that: the video playing device in this embodiment further includes a marking unit. The video playing device plays the target video and performs redundant fragment identification, so that redundant analysis of the online video is realized; and the redundant mark is displayed in the playing page, so that the user can conveniently identify the redundant video clip, and further the playing mode is determined. Referring to fig. 5, the video playback apparatus may include an acquisition unit 501, an identification unit 502, a merging unit 503, a marking unit 504, and a playback unit 505, as follows:

(1) An acquisition unit 501;

an acquisition unit 501, configured to acquire a target video.

The acquiring unit 501 may receive a video playing instruction, and acquire a target video indicated by the video playing instruction.

For example, when a user watches an online video, the user selects a video to be played and inputs a video playing instruction.

The acquiring unit 501 receives a video playing instruction input by a user, acquires a target video indicated by the video playing instruction, that is, a video selected by the user, and plays the target video.

(2) An identification unit 502;

the identifying unit 502 is configured to obtain a scene category of a video frame according to a picture content of the video frame in the target video.

(3) A merging unit 503;

and the merging unit 503 is configured to merge video frames with adjacent playing orders and same scene categories into a redundant video group.

For example, the identifying unit 502 obtains the target video in the streaming media format through the network, plays the target video, and in the playing process, the identifying unit 502 analyzes the picture content of the video frame to the target video in the streaming media format, and obtains the scene category of each video frame.

For example, after acquiring a video frame of a target video, the identifying unit 502 first acquires a scene category of the video frame according to a picture content of the video frame.

If the scene category of a video frame is the same as that of a video frame adjacent to the playing sequence, the merging unit 503 merges the video frames adjacent to the playing sequence and having the same scene category into a redundant video group; if the scene category of a video frame is different from the scene category of a video frame adjacent to the playing order, the identifying unit 502 normally plays the video frame according to the playing order, and continues to analyze the picture content of other video frames while playing, and the merging unit 503 merges the video frames adjacent to the playing order and having the same scene category.

(4) A marking unit 504;

a marking unit 504, configured to identify, in the playing page of the target video, a redundant video group by using a preset redundant mark.

The marking unit 504 obtains the position of the video frame in the redundant video group in the target video, for example, the time when the video frame is played. Thus, the marking unit 504 may determine a start time of the redundant video group in the target video, for example, taking a play time of a first video frame in the redundant video group as a start time of the play of the redundant video group.

The information such as the name and the playing progress bar of the target video is displayed in the playing page of the target video, the marking unit 504 finds the starting time position of the playing of the redundant video group in the playing progress bar of the target video, and then displays a redundant mark at the starting time position to identify that the redundant video group starts from there. Of course, the marking unit 504 may also obtain the end time of playing the redundant video set according to the playing time of the last video frame in the redundant video set, so as to display the redundant mark at the start time position and the end time position of the redundant video set.

In some embodiments, the marking unit 504 is specifically configured to: determining the playing time of the redundant video group; determining a mark position in a playing page of the target video according to the playing time; and displaying a preset redundant mark at the mark position to identify the redundant video group.

The playing time of the redundant video group includes a starting time and/or an ending time of playing the redundant video group, and specifically may be determined according to the playing time of the first frame video frame and the last frame video frame in the target video, where the marking unit 504 uses the playing time of the first frame video frame as the starting time of playing the redundant video group, and uses the playing time of the last frame video frame as the ending time of playing the redundant video group. For example, the start time of the redundant video set is 5 th minute, the end time is 10 th minute, and the play time is 5 th to 10 th minutes.

Then, the marking unit 504 determines a marking position for displaying the redundant mark in the play page of the target video according to the play time. As an embodiment, the marking unit 504 may determine a start position and/or an end position of playing the redundant video group according to a playing time of the playing page in a playing progress bar of the playing page, and then determine a marking position of the redundant mark according to the start position and/or the end position and a preset marking area rule. For example, if the preset marking area rule is a preset distance below the start position, the marking unit 504 takes a position of the redundant video group below the start position by the preset distance in the progress bar as a marking position; the marking unit 504 takes the position below the progress bar between the start position and the end position of the redundant video group as the marking position if the preset marking region rule is the position below the progress bar between the start position and the end position.

Thus, the marking unit 504 obtains a marking position where the redundant mark is displayed.

Then, the marking unit 504 displays a preset redundant mark at the marking position, so as to identify the redundant video group, so that the user can conveniently know those fragments in the target video are redundant fragments, and determine the positions of the redundant fragments.

In some embodiments, the redundant indicia includes a fold-over indicia for identifying the folded redundant video groups, prompting the user for folded redundant video segments, and the marking unit 504 may be further configured to: determining a play interval of the redundant video group on a play progress bar in a play page; folding the playing interval on the playing progress bar.

After obtaining the playing time of the redundant video group, the marking unit 504 finds a starting position and an ending position of playing the redundant video group in a playing progress bar of the playing page according to the playing time, and takes the progress bar between the starting position and the ending position as a playing interval.

Then, the marking unit 504 folds the playing interval on the playing progress bar with the playing interval as a marking position, so that the redundant video group is folded on the playing progress bar, the user can more intuitively feel the identification effect of the redundant video piece, and the folding mark in the redundant mark can prompt the user to notice the folded redundant piece, so that the viewing experience of the user is improved.

In some embodiments, the preset redundant markers include scene categories, and the marking unit 504 may be further configured to: configuring a scene tag for the redundant video group according to the scene category of the video frame in the redundant video group; the scene tags of the redundant video groups are configured into redundant tags.

If the video scene categories included in the redundant video group are the same, the marking unit 504 configures the scene category of the video frame as a scene tag of the redundant video group; or according to a preset scene model tree, acquiring a main class subordinate to the scene class of the video frame, and taking the main class as a scene tag of the redundant video group.

If the video frame scene categories included in the redundant video group are different and belong to the same main category, the marking unit 504 determines such main category as the scene tag of the redundant video group. Of course, the main class may be configured as a main tag, statistics is performed on scene types of video frames in the redundant video group, the scene types obtained by statistics are used as sub-tags, and then the main tag and the sub-tags are configured as scene tags of the redundant video group together.

Then, the scene tag of the redundant video group is configured into a redundant tag by the tagging unit 504.

In some embodiments, the preset redundant mark includes a duration of the expanded play of the redundant video group, and the marking unit 504 may be further configured to: and acquiring the time length of the expanded and played redundant video group, and configuring the time length of the expanded and played redundant video group into the redundant mark.

The marking unit 504 may calculate a duration of the expanded play of the redundant video group according to the play time of the first video frame and the play time of the last video frame in the redundant video group. Of course, the marking unit 504 may also obtain the frame rate of the target video, and calculate the duration of the expanded playing of the redundant video group according to the number of video frames and the frame rate in the redundant video group.

Then, the marking unit 504 configures the duration of the expanded playing of the redundant video group into its redundant marks, and correspondingly, in the playing page of the target video, the redundant mark identifying the redundant video group includes the duration of the expanded playing of the redundant video group, so as to help the user decide whether to play the redundant video group.

(5) A playback unit 505;

and the playing unit 505 is configured to play the redundant video set according to a preset playing mode when the target video is played to the redundant video set.

For example, after the marking unit 504 marks the redundant video group by using the preset redundant mark in the playing page of the target video, if the playing unit 505 receives an expansion instruction for the folding mark of the next redundant video group before playing the key frame of the next redundant video group, the playing interval of the next redundant video group is expanded on the playing progress bar, and each frame in the next redundant video group is sequentially played.

The playing unit 505 receives an expansion instruction input by the user to the folding mark, determines that the user wants to sequentially play the next redundant video group, switches the playing mode to sequential playing, expands the playing interval in the playing progress bar, and then plays the video frames of each frame according to the playing sequence when playing the redundant video group.

In one embodiment, before playing the next redundant video group, if the user does not input the expansion instruction to the redundancy flag of the next redundant video group, the playing unit 505 determines that the expansion instruction is not received, plays the next redundant video group according to the thumbnail mode, for example, skips the next redundant video group or plays only the feature video frame therein, and so on.

Therefore, the real-time flexible switching of the playing modes is realized.

In some embodiments, when receiving the expanding instruction input to the redundancy flag, the playing unit 505 may further display the feature video frame of the redundancy video group for the user to preview the video content.

As can be seen from the above, the obtaining unit 501 in the embodiment of the present invention receives the video playing instruction and obtains the target video indicated by the video playing instruction. The identification unit 502 plays the target video, and in the playing process, the scene category of the video frame is identified according to the picture content of the video frame in the target video, so that the analysis and identification of the picture content are realized, and whether the content of the video frame is repeatedly displayed in a redundant mode is judged; then, the merging unit 503 merges the video frames with adjacent playing orders and same scene categories into a redundant video group, and the marking unit 504 marks the redundant video group by using a preset redundant mark in the playing page of the target video, so that the user can conveniently identify the redundant fragments; when the target video is played to the redundant video group, the playing unit 505 plays the redundant video group according to a preset playing mode. The scheme realizes the simultaneous downloading, playing and redundancy analysis of videos, obtains scene types of each video frame in the target video by automatically analyzing the picture content of the video frame in the playing process of the target video, combines the video frames with the same scene types into a redundancy video group, namely, can obtain video fragments with repeated content, marks the redundancy video group by using redundancy marks in the playing page, enables a user to intuitively feel the redundancy analysis result, and further, can control how to play the screened redundancy video group, such as skip or normal play, according to the playing mode set by the user when playing the video. According to the scheme, the regulation and control efficiency of the played content is improved, the regulation and control of the video played content is very convenient, and the user experience is improved.

Accordingly, embodiments of the present invention also provide a terminal, as shown in fig. 6, which may include a Radio Frequency (RF) circuit 601, a memory 602 including one or more computer readable storage media, an input unit 603, a display unit 604, a sensor 605, an audio circuit 604, a wireless fidelity (WiFi, wireless Fidelity) module 607, a processor 608 including one or more processing cores, and a power supply 609. It will be appreciated by those skilled in the art that the terminal structure shown in fig. 6 is not limiting of the terminal and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:

the RF circuit 601 may be used for receiving and transmitting signals during a message or a call, and in particular, after receiving downlink information of a base station, the downlink information is processed by one or more processors 608; in addition, data relating to uplink is transmitted to the base station. Typically, RF circuitry 601 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a subscriber identity module (SIM, subscriber Identity Module) card, a transceiver, a coupler, a low noise amplifier (LNA, low Noise Amplifier), a duplexer, and the like. In addition, the RF circuitry 601 may also communicate with networks and other devices through wireless communications. The wireless communication may use any communication standard or protocol including, but not limited to, global system for mobile communications (GSM, global System of Mobile communication), general packet radio service (GPRS, general Packet Radio Service), code division multiple access (CDMA, code Division Multiple Access), wideband code division multiple access (WCDMA, wideband Code Division Multiple Access), long term evolution (LTE, long Term Evolution), email, short message service (SMS, short Messaging Service), and the like.

The memory 602 may be used to store software programs and modules that are stored in the memory 602 for execution by the processor 608 to perform various functional applications and data processing. The memory 602 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the terminal, etc. In addition, the memory 602 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 602 may also include a memory controller to provide access to the memory 602 by the processor 608 and the input unit 603.

The input unit 603 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, in one particular embodiment, the input unit 603 may include a touch-sensitive surface, as well as other input devices. The touch-sensitive surface, also referred to as a touch display screen or a touch pad, may collect touch operations thereon or thereabout by a user (e.g., operations thereon or thereabout by a user using any suitable object or accessory such as a finger, stylus, etc.), and actuate the corresponding connection means according to a predetermined program. Alternatively, the touch-sensitive surface may comprise two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 608, and can receive commands from the processor 608 and execute them. In addition, touch sensitive surfaces may be implemented in a variety of types, such as resistive, capacitive, infrared, and surface acoustic waves. The input unit 603 may comprise other input devices in addition to a touch sensitive surface. In particular, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 604 may be used to display information input by a user or information provided to the user and various graphical user interfaces of the terminal, which may be composed of graphics, text, icons, video and any combination thereof. The display unit 604 may include a display panel, which may be optionally configured in the form of a liquid crystal display (LCD, liquid Crystal Display), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch-sensitive surface may overlay a display panel, and upon detection of a touch operation thereon or thereabout, the touch-sensitive surface is passed to the processor 608 to determine the type of touch event, and the processor 608 then provides a corresponding visual output on the display panel based on the type of touch event. Although in fig. 6 the touch sensitive surface and the display panel are implemented as two separate components for input and output functions, in some embodiments the touch sensitive surface may be integrated with the display panel to implement the input and output functions.

The terminal may also include at least one sensor 605, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel according to the brightness of ambient light, and a proximity sensor that may turn off the display panel and/or backlight when the terminal moves to the ear. As one type of motion sensor, the gravity acceleration sensor can detect the acceleration in all directions (typically three axes), and can detect the gravity and direction when stationary, and can be used for applications of recognizing the gesture of a terminal (such as horizontal-vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer, knocking), and the like.

Audio circuitry 604, speakers, and a microphone may provide an audio interface between the user and the terminal. The audio circuit 604 may transmit the received electrical signal after audio data conversion to a speaker, where the electrical signal is converted into a sound signal for output; on the other hand, the microphone converts the collected sound signals into electrical signals, which are received by the audio circuit 604 and converted into audio data, which are processed by the audio data output processor 608 for transmission to, for example, another terminal via the RF circuit 601, or which are output to the memory 602 for further processing. The audio circuit 604 may also include an ear bud jack to provide communication of the peripheral ear bud with the terminal.

The WiFi belongs to a short-distance wireless transmission technology, and the terminal can help the user to send and receive e-mail, browse web pages, access streaming media and the like through the WiFi module 607, so that wireless broadband internet access is provided for the user. Although fig. 6 shows a WiFi module 607, it is understood that it does not belong to the essential constitution of the terminal, and can be omitted entirely as required within the scope of not changing the essence of the invention.

The processor 608 is a control center of the terminal, and connects various parts of the entire terminal using various interfaces and lines, and performs various functions of the terminal and processes data by running or executing software programs and/or modules stored in the memory 602, and calling data stored in the memory 602, thereby performing overall monitoring of the terminal. Optionally, the processor 608 may include one or more processing cores; preferably, the processor 608 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 608.

The terminal also includes a power supply 609 (e.g., a battery) for powering the various components, which may be logically connected to the processor 608 via a power management system so as to provide for managing charging, discharging, and power consumption by the power management system. The power supply 609 may also include one or more of any components, such as a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

Although not shown, the terminal may further include a camera, a bluetooth module, etc., which will not be described herein. Specifically, in this embodiment, the processor 608 in the terminal loads executable files corresponding to the processes of one or more application programs into the memory 602 according to the following instructions, and the processor 608 executes the application programs stored in the memory 602, so as to implement various functions:

acquiring a target video;

acquiring scene categories of video frames according to picture contents of the video frames in the target video;

The embodiment of the invention also provides video playing equipment which can be a personal computer, a smart phone, a tablet personal computer, an MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert compression standard audio layer 4) player, a portable computer and other terminal equipment with a display function.

Referring to fig. 7, the video playback apparatus includes: a display 701, a processor 702, a memory 703, and a video playback program stored on the memory and executable on the processor. In addition, the video playing device may further include a network interface, a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, and/or a WiFi module, etc.

Wherein the processor 702, e.g., a CPU; the memory 703 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory, and the memory 703 may alternatively be a storage device independent of the processor 702

In the video playback device shown in fig. 7, the display screen is mainly used for receiving a target video signal processed by the video playback program and displaying the target video under the control of the processor 702; the processor 702 may call a video playing program stored in the memory 703, performing the following operations:

Acquiring a target video; acquiring scene categories of video frames according to picture contents of the video frames in the target video; combining video frames adjacent in playing sequence and same in scene category into a redundant video group; and when the target video is played to the redundant video group, playing the redundant video group according to a preset playing mode.

In some embodiments, the processor 702 may also call a video playing program stored in the memory 703 to perform the following operations: determining a characteristic video frame in the redundant video group; the feature video frames of each redundant video group are played sequentially.

In some embodiments, the processor 702 may also call a video playing program stored in the memory 703 to perform the following operations: and identifying the redundant video group by using a preset redundant mark in the playing page of the target video.

In some embodiments, the processor 702 may also call a video playing program stored in the memory 703 to perform the following operations: determining the playing time of the redundant video group; determining a mark position in a playing page of the target video according to the playing time; and displaying a preset redundant mark at the mark position to identify the redundant video group.

In some embodiments, the redundant marks include fold marks, and the processor 702 may also call a video playback program stored in the memory 703 to: determining a play interval of the redundant video group on a play progress bar in a play page; folding the playing interval on the playing progress bar.

In some embodiments, the processor 702 may also call a video playing program stored in the memory 703 to perform the following operations: and if an unfolding instruction for the folding mark of the next redundant video group is received before the key frame of the next redundant video group is played, unfolding a playing interval of the next redundant video group on a playing progress bar, and sequentially playing each frame in the next redundant video group.

In some embodiments, the redundant markers further include scene tags and or the duration of the redundant video group expanded play.

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, an embodiment of the present invention provides a storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform the steps of any one of the video playing methods provided by the embodiment of the present invention. For example, the instructions may perform the steps of:

In some embodiments, the instructions may further perform the steps of: determining a characteristic video frame in the redundant video group; the feature video frames of each redundant video group are played sequentially.

In some embodiments, the instructions may further perform the steps of: and identifying the redundant video group by using a preset redundant mark in the playing page of the target video.

In some embodiments, the instructions may further perform the steps of: determining the playing time of the redundant video group; determining a mark position in a playing page of the target video according to the playing time; and displaying a preset redundant mark at the mark position to identify the redundant video group.

In some embodiments, the redundant indicia includes a fold-over indicia, and the instructions may further perform the steps of: determining a play interval of the redundant video group on a play progress bar in a play page; folding the playing interval on the playing progress bar.

In some embodiments, the instructions may further perform the steps of:

and if an unfolding instruction for the folding mark of the next redundant video group is received before the key frame of the next redundant video group is played, unfolding a playing interval of the next redundant video group on a playing progress bar, and sequentially playing each frame in the next redundant video group.

Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

The steps in any video playing method provided by the embodiment of the present invention can be executed due to the instructions stored in the storage medium, so that the beneficial effects that can be achieved by any video playing method provided by the embodiment of the present invention can be achieved, and detailed descriptions of the previous embodiments are omitted herein.

The foregoing describes in detail a video playing method, apparatus, device and storage medium provided by the embodiments of the present invention, and specific examples are applied to illustrate the principles and implementations of the present invention, where the foregoing examples are only used to help understand the method and core idea of the present invention; meanwhile, as those skilled in the art will vary in the specific embodiments and application scope according to the ideas of the present invention, the present description should not be construed as limiting the present invention in summary.

Claims

1. A video playing method, comprising:

acquiring a target video;

acquiring the film and television category of the target video, inputting the target video into a classification model of the same film and television category according to the film and television category of the target video, and triggering the classification model to classify video frames according to the picture content of the video frames in the target video; obtaining a classification result output by the classification model to obtain scene categories of video frames;

identifying redundant video frames with adjacent playing orders and identical scene categories according to the scene categories of the video frames, wherein the redundant video frames comprise video frames with scene categories belonging to the redundant categories;

merging the redundant video frames with adjacent playing sequences and same scene categories into a redundant video group, wherein the redundant video group comprises video fragments with similar or same picture content;

and when the target video is played to the redundant video group, playing the redundant video group according to a preset playing mode, wherein the preset playing mode is a playing mode selected by a user, the preset playing mode comprises a thumbnail mode, and the thumbnail mode comprises playing a preset number of video frames in the redundant video group.

2. The method of claim 1, wherein the merging the redundant video frames that are adjacent in the play order and have the same scene category into a redundant video group, further comprises:

3. The method of claim 2, wherein the identifying the redundant video group in the play page of the target video using a preset redundancy flag comprises:

determining the playing time of the redundant video group;

4. The method of claim 3, wherein the redundant mark comprises a fold mark, the determining a mark position in a play page of the target video being determining a play interval of the redundant video group on a play progress bar in the play page;

5. The method of claim 4, wherein playing the redundant video group according to a preset play mode comprises:

6. The method of any of claims 2-4, wherein the redundant indicia further comprises a scene tag and or a duration of the redundant video group expanded play.

7. A video playback device, comprising:

the acquisition unit is used for acquiring the target video;

the identification unit is used for acquiring the film and television category of the target video, inputting the target video into a classification model of the same film and television category according to the film and television category of the target video, and triggering the classification model to classify the video frames according to the picture content of the video frames in the target video; obtaining a classification result output by the classification model to obtain scene categories of video frames;

the video frame identification unit is used for identifying redundant video frames with adjacent playing orders and same scene categories according to the scene categories of the video frames, wherein the redundant video frames comprise video frames with the scene categories belonging to the redundant categories;

The merging unit is used for merging the redundant video frames with adjacent playing sequences and same scene categories into a redundant video group, wherein the redundant video group comprises video fragments with similar or same picture content;

and the playing unit is used for playing the redundant video group according to a preset playing mode when the target video is played to the redundant video group, wherein the preset playing mode is a playing mode selected by a user, the preset playing mode comprises a thumbnail mode, and the thumbnail mode comprises playing a preset number of video frames in the redundant video group.

8. A video playback device, the video playback device comprising: a display screen, a processor, a memory, and a video playback program stored on the memory and executable on the processor, wherein:

the display screen is used for displaying target videos:

the video playback program when executed by the processor implements the steps of the video playback method of any one of claims 1 to 6.

9. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps of the video playing method of any one of claims 1 to 6.