CN113873323A

CN113873323A - Video playing method and device, electronic equipment and medium

Info

Publication number: CN113873323A
Application number: CN202110857081.7A
Authority: CN
Inventors: 侯在鹏
Original assignee: Baidu Online Network Technology Beijing Co Ltd
Current assignee: Baidu Online Network Technology Beijing Co Ltd
Priority date: 2021-07-28
Filing date: 2021-07-28
Publication date: 2021-12-31
Anticipated expiration: 2041-07-28
Also published as: CN113873323B

Abstract

The present disclosure provides a video playing method, an apparatus, an electronic device and a medium, which relate to the technical field of computers, and in particular to the technical field of video playing technology, cloud computing and cloud service. The specific implementation scheme is as follows: selecting a target video attribute from the candidate video attributes according to the attribute information of the candidate video attributes in the target video; the candidate video attributes include at least one of: entities, lines, or barrages; and extracting video content to be played from the target video according to the target video attribute. The method and the device have the advantages that the playing progress does not need to be manually controlled by a user, the video content required by the user is automatically positioned and played, and the video playing efficiency is improved.

Description

Video playing method and device, electronic equipment and medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a video playing technology, a cloud computing technology, and a cloud service technology, and in particular, to a video playing method, an apparatus, an electronic device, and a medium.

Background

At present, the duration of mainstream video resources such as movies, television shows or integrated art is generally more than one hour, and a user needs to spend a long time to watch a video completely.

The existing video playing software generally only supports a user to manually control a progress bar to roughly select a video clip desired to be viewed.

Disclosure of Invention

The present disclosure provides a method, apparatus, electronic device, and medium for improving video playing efficiency.

According to an aspect of the present disclosure, there is provided a video playing method, including:

selecting a target video attribute from the candidate video attributes according to the attribute information of the candidate video attributes in the target video; the candidate video attributes include at least one of: entities, lines, or barrages;

and extracting video content to be played from the target video according to the target video attribute.

According to another aspect of the present disclosure, there is provided a video playback apparatus including:

the target video attribute selection module is used for selecting a target video attribute from the candidate video attributes according to the attribute information of the candidate video attributes in the target video; the candidate video attributes include at least one of: entities, lines, or barrages;

and the video content extraction module is used for extracting the video content to be played from the target video according to the target video attribute.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method according to any one of the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a flow chart of a video playing method disclosed according to an embodiment of the present disclosure;

fig. 2A is a flowchart of a video playing method disclosed in an embodiment of the present disclosure;

fig. 2B is an interface schematic diagram of target bullet screen information disclosed in an embodiment of the present disclosure;

FIG. 2C is an interface schematic diagram of a total number of target barrage according to an embodiment of the present disclosure;

fig. 3A is a flowchart of a video playing method disclosed in an embodiment of the present disclosure;

FIG. 3B is a schematic illustration of an interface for candidate entity information disclosed in accordance with an embodiment of the present disclosure;

fig. 4A is a flowchart of a video playing method disclosed in an embodiment of the present disclosure;

FIG. 4B is an interface schematic diagram of target speech information disclosed in an embodiment of the present disclosure;

FIG. 5 is a flow chart of a video playing method according to an embodiment of the disclosure;

fig. 6 is a schematic structural diagram of a video playing apparatus according to an embodiment of the present disclosure;

fig. 7 is a block diagram of an electronic device for implementing a video playing method disclosed in an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the research and development process of the applicant, when a user uses the existing video playing software to play a video, if the user wants to watch a favorite video clip, the user usually manually controls a progress bar at the bottom of the video to adjust the video playing progress so as to locate the favorite video clip. However, in general, a user watches a certain video for the first time, so that the user mostly does not know the specific playing time of the video clip that the user wants to watch, and at this time, the user needs to repeatedly control the progress bar to accurately locate the video clip that the user wants to watch, which undoubtedly greatly reduces the video playing efficiency and leads to poor video watching experience of the user.

Fig. 1 is a flowchart of a video playing method disclosed in an embodiment of the present disclosure, and this embodiment may be applied to a case of performing skip playing on a target video. The method of the present embodiment may be executed by a video playing apparatus disclosed in the embodiments of the present disclosure, and the apparatus may be implemented by software and/or hardware, and may be integrated on any electronic device with computing capability.

As shown in fig. 1, the video playing method disclosed in this embodiment may include:

s101, selecting a target video attribute from candidate video attributes according to attribute information of the candidate video attributes in a target video; the candidate video attributes include at least one of: entity, lines, or barrage.

The target video is a video watched or to be watched by a user, and can be a video resource locally stored by a client, namely the target video is locally played at the moment; the video resources stored in the cloud can also be used, namely the target video is played online at the moment. The candidate video attribute is extracted according to the video content of the target video, and the candidate video attribute comprises at least one of an entity, a speech or a bullet screen, wherein the entity represents an entity included in the video content of the target video, such as a character, a building, a car or a landscape; the lines represent lines related to all roles in the video content of the target video; the bullet screen means a bullet screen transmitted in any video frame of a target video by a user who has viewed the target video or is viewing the target video.

In one embodiment, the attribute information in the target video is extracted in advance according to the candidate video attributes of the target video, and the attribute information corresponding to each candidate video attribute is determined. Specifically, the following A, B and C cases may be included:

A. and when the candidate video attribute is the entity attribute, identifying entity information contained in each video frame of the target video based on an entity identification algorithm as attribute information corresponding to the entity attribute. And associating each entity information with the video frame to which it belongs, that is, determining at least one video frame containing the entity information according to any entity information. Optionally, if the entity attribute is a role, identifying, by using a face recognition algorithm, face information in each video frame in the target video as attribute information corresponding to the entity attribute.

B. When the candidate video attribute is the speech attribute, identifying the audio data of each video frame of the target video through a voice recognition technology, thereby determining the speech information contained in the target video as the attribute information corresponding to the speech attribute; or recognizing subtitles in each video frame of the target video through an optical character recognition technology, so as to determine the speech-line information contained in the target video as attribute information corresponding to the speech-line attribute. And associating each speech information with the video frame to which it belongs, that is, determining at least one video frame containing the speech information according to any speech information.

C. And when the candidate video attribute is the bullet screen attribute, identifying bullet screen information contained in each video frame of the target video through an optical character recognition technology, and using the bullet screen information as attribute information corresponding to the entity attribute. And associating each bullet screen information with the video frame to which the bullet screen information belongs, namely determining at least one video frame containing the bullet screen information according to any bullet screen information.

After determining the attribute information of each candidate video attribute, displaying the candidate video attributes and the attribute information of the candidate video attributes in the interface of the target video in the form of a control. When watching the target video, the user can select the candidate attributes and the attribute information according to the watching requirement of the user.

Specifically, after the user selects any candidate video attribute from the control, the control correspondingly displays the attribute information included in the candidate video attribute, and the user continues to select at least one target attribute information from the attribute information. And acquiring at least one piece of target attribute information of the candidate video attribute selected by the user, if the user does not select other candidate video attributes subsequently, taking the candidate video attribute as the target video attribute, and taking at least one piece of target attribute information of the candidate video attribute as the target attribute information of the target video attribute. For example, when the user selects only attribute information "role a" and "role B" of the candidate video attribute "entity", the "entity" is taken as the target video attribute, and the "role a" and the "role B" are taken as the target attribute information of the "entity".

And if the user continues to select at least one other candidate video attribute subsequently, screening the attribute information of the other candidate video attributes according to at least one attribute information of the candidate video attribute selected for the first time, and displaying the remaining attribute information of the other candidate video attributes after screening for the user to select. And finally, taking the candidate video attribute selected for the first time and at least one other candidate video attribute selected this time as the target video attribute, and taking the attribute information of the candidate video attribute selected for the first time and the residual attribute information of at least one other candidate video attribute selected this time as the target attribute information of the target video attribute.

For example, assuming that the user selects the attribute information "role a" and "role B" of the candidate video attribute of "entity" for the first time, if the user subsequently selects another candidate video attribute of "line" including four attribute information "line 1", "line 2", "line 3", and "line 4", the line information "line 1" and "line 2" not belonging to "role a" and "role B" are removed to obtain the remaining attribute information "line 3" and "line 4". If the user selects the "speech 3", the "entity" and the "speech" are collectively used as the target video attribute, and the "role a", "role B", and the "speech 3" are collectively used as the target attribute information of the target video attribute.

The target video attribute is selected from the candidate video attributes according to the attribute information of the candidate video attributes in the target video, so that a foundation is laid for extracting the video content to be played according to the target video attribute subsequently.

And S102, extracting video content to be played from the target video according to the target video attribute.

In an embodiment, when the number of the target video attributes is one and the target video attributes are entities, determining a target video frame associated with the target entity information according to target entity information corresponding to the entity attributes and a pre-established association relationship between the entity information and the video frame, and extracting the target video frame to generate video content to be played. For example, assuming that the target entity information is "role a", and the association relationship between "role a" and the 100 th to 200 th video frames of the target video is established in advance, the 100 th to 200 th video frames of the target video are extracted to generate the video content to be played.

In another embodiment, when the number of the target video attributes is one and the target video attributes are lines, the target video frames associated with the target line information are determined according to the target line information corresponding to the line attributes and the pre-established association relationship between the line information and the video frames, and the target video frames are extracted to generate the video content to be played. For example, if the target speech information is 'what is eaten at noon today', and the association relationship between 'what is eaten at noon today' and the 20 th to 40 th video frames of the target video is established in advance, the 20 th to 40 th video frames of the target video are extracted to generate the video content to be played.

In another embodiment, when the number of the target video attributes is one and the target video attributes are barrage, determining a target video frame associated with the target barrage information according to the target barrage information corresponding to the barrage attributes and a pre-established association relationship between the barrage information and the video frame, and extracting the target video frame to generate the video content to be played. For example, assuming that the target bullet screen information is 'good scenery', and the association relationship between the 'good scenery' and the 30 th to 50 th, 80 th to 90 th and 110 th to 120 th video frames of the target video is established in advance, the 30 th to 50 th, 80 th to 90 th and 110 th to 120 th video frames of the target video are extracted to generate the video content to be played.

In another embodiment, in the case where the number of target video attributes is at least two, the explanation is made by taking two of "entity" and "line" as an example. Determining a first target video frame associated with the target entity information according to the target entity information corresponding to the entity attribute and the pre-established association relationship between the entity information and the video frame, determining a second target video frame associated with the target line information from the first target video frame according to the target line information corresponding to the line attribute and the pre-established association relationship between the line information and the video frame, and extracting the second target video frame to generate the video content to be played.

The method comprises the steps of selecting target video attributes from candidate video attributes according to attribute information of the candidate video attributes in the target video; the candidate video attributes include at least one of: the entity, the lines or the barrage extract the video content to be played from the target video according to the target video attribute, so that the effect of automatically positioning and playing the video content required by the user without manually controlling the playing progress of the user is realized, the video playing efficiency is improved, multi-dimensional video skipping is supported, and the personalized film watching requirement of the user is met.

Fig. 2A is a flowchart of a video playing method disclosed according to an embodiment of the present disclosure, which is suitable for a case where the number of target video attributes is one and the target video attributes are barrage, and further optimization and expansion are performed based on the above technical solution, and the method can be combined with the above optional embodiments.

As shown in fig. 2A, the video playing method disclosed in this embodiment may include:

s201, selecting a target video attribute from candidate video attributes according to attribute information of the candidate video attributes in the target video; the candidate video attributes include at least one of: entity, lines, or barrage.

S202, under the condition that the attribute of the target video is the bullet screen, determining the occurrence frequency of the candidate bullet screen information in the target video.

In one embodiment, the bullet screen resources sent by the user in the target video are packaged into an Optical Character Recognition (OCR) interface, the bullet screen content of the target video is analyzed through an OCR algorithm, candidate bullet screen information included in the target video is obtained, and the number of times that each candidate bullet screen information appears in the target video is counted. For example, assuming that the bullet screen information of 10 pieces of "highlight" in the target video is collectively present, the number of times the "highlight" appears in the target video is 10 times.

And S203, selecting target bullet screen information to be displayed from the candidate bullet screen information according to the occurrence times.

In one embodiment, the occurrence times are sorted in descending order from high to low according to the occurrence times of each candidate barrage information in the target video, and the target occurrence times are determined according to the sorting result, for example, the first three occurrence times with the largest value are taken as the target occurrence times. And determining candidate barrage information corresponding to the target according to the occurrence frequency of the target, wherein the candidate barrage information is used as target barrage information to be displayed.

For example, assuming that the target appearance times are "100", "95", and "90", the candidate barrage information corresponding to the target appearance time "100" is "a highlight scenario", the candidate barrage information corresponding to the target appearance time "95" is "a leading actor out, and the candidate barrage information corresponding to the target appearance time" 90 "is" a first scene ", the" highlight scenario "," leading actor out ", and" first scene "are taken as the target barrage information to be displayed.

And S204, extracting video content to be played from the target video according to the target bullet screen information.

In one implementation mode, a user selects at least one of the displayed target barrage information as index barrage information according to the displayed target barrage information, and extracts video content to be played from the target video according to the index barrage information selected by the user and a pre-established association relationship between the barrage information and video frames.

Optionally, S204 includes:

extracting a video frame comprising the target bullet screen information from the target video; and generating the video content to be played according to the extracted video frame.

Specifically, according to index barrage information selected by a user and a pre-established association relationship between the barrage information and the video frames, a target video frame associated with the index barrage information is determined, and the target video frame is extracted to generate video content to be played.

The video frame comprising the target barrage information is extracted from the target video, and the video content to be played is generated according to the extracted video frame, so that the effect of generating the video content to be played based on the barrage information selected by the user is realized, and the personalized film watching requirement of the user is met.

Fig. 2B is an interface schematic diagram of target bullet screen information disclosed according to an embodiment of the present disclosure, as shown in fig. 2B, where 20, 21, and 22 respectively represent three target bullet screen information, and a user may select at least one of the three target bullet screen information as index bullet screen information, and further extract video content to be played from a target video 23 according to the index bullet screen information selected by the user and an association relationship between the preset bullet screen information and a video frame.

According to the method and the device, under the condition that the attribute of the target video is the bullet screen, the occurrence frequency of the candidate bullet screen information in the target video is determined, the target bullet screen information to be displayed is selected from the candidate bullet screen information according to the occurrence frequency, and then the video content to be played is extracted from the target video according to the target bullet screen information, so that the effect that the video content required by the user is automatically positioned and played without manually controlling the playing progress of the user is achieved, the video playing efficiency is improved, the effect that the video skipping based on the bullet screen dimension of the user is supported, the personalized film watching requirement of the user is met, and the target bullet screen information is determined according to the occurrence frequency, so that the corresponding video frame can be a wonderful video clip generally, and the film watching experience of the user is improved.

On the basis of the above embodiment, the method further includes:

and under the condition that the target video attribute is the bullet screen, determining the total number of the bullet screens of all candidate video frames in the target video, and extracting the video content to be played from the target video according to the total number of the bullet screens.

In one embodiment, the total number of bullet screens of each candidate video frame is sorted in descending order from high to low according to the total number of bullet screens, and the total number of target bullet screens is determined according to the sorting result, for example, the total number of the first three bullet screens with the largest total number of bullet screens is taken as the total number of target bullet screens. And then determining candidate video frames corresponding to the target barrage according to the total number of the target barrages to be used as target video frames, and displaying the total number of the target barrages and the playing time of the target video frames, so that a user can select and determine the index target video frames to extract the video content to be played from the target video frames.

For example, if the user selects the target total number of bullet screens with the highest total number of bullet screens, the target video frames corresponding to the target total number of bullet screens are used as the index target video frames, and then the index target video frames are extracted from the target video frames and used as the video content to be played.

Fig. 2C is an interface schematic diagram of a total number of target barrages, as shown in fig. 2C, where 24, 25, and 26 respectively represent the total number of three target barrages, and a user may select at least one of the three target barrages to determine an index target video frame, and further extract the index target video frame from the target video 23 as video content to be played.

Fig. 3A is a flowchart of a video playing method disclosed according to an embodiment of the present disclosure, which is suitable for a case where the number of target video attributes is one and is an entity, and further optimization and expansion are performed based on the above technical solution, and may be combined with the above optional embodiments.

As shown in fig. 3A, the video playing method disclosed in this embodiment may include:

s301, selecting a target video attribute from candidate video attributes according to attribute information of the candidate video attributes in the target video; the candidate video attributes include at least one of: entity, lines, or barrage.

S302, under the condition that the target video attribute is an entity, determining candidate entity information to be displayed in the target video.

In one implementation, video frames of a target video are imported into a recognition interface in batch, entity information included in the target video is analyzed frame by frame through a preset entity recognition algorithm, and candidate entity information to be displayed included in the target video is output. And the total playing time of the video frame to which each candidate entity information belongs and the corresponding candidate entity information are displayed together.

S303, selecting target entity information from the candidate entity information, and extracting video content to be played from the target video according to the target entity information.

In one implementation mode, a user selects at least one of the candidate entity information as target entity information according to the displayed candidate entity information, and extracts a video frame associated with the target entity information from the target video as played video content according to the target entity information selected by the user and a pre-established association relationship between the entity information and the video frame. The selection operation of the user may be a touch selection operation, such as single click, double click, or drag; it may also be a voice selection operation, for example by speaking the voice instruction "i want to see the segment comprising XX", etc.

Fig. 3B is an interface schematic diagram of candidate entity information disclosed according to an embodiment of the present disclosure, as shown in fig. 3B, where 27, 28, and 29 respectively represent three candidate entity information, and a user may select at least one of the three candidate entity information as target entity information, and further extract video content to be played from a target video 23 according to the target entity information selected by the user and an association relationship between the entity information and a video frame established in advance.

According to the method and the device, under the condition that the attribute of the target video is an entity, candidate entity information to be displayed in the target video is determined, the target entity information is selected from the candidate entity information, and the video content to be played is extracted from the target video according to the target entity information, so that the effect that the playing progress is not required to be manually controlled by a user, the video content required by the user is automatically positioned and played is realized, the video playing efficiency is improved, the video skipping of the user based on the entity dimension is supported, and the effect that the personalized film watching requirement of the user is met.

Fig. 4A is a flowchart of a video playing method disclosed according to an embodiment of the present disclosure, which is suitable for a case where the number of target video attributes is one and a speech is a speech, and is further optimized and expanded based on the foregoing technical solution, and can be combined with the foregoing optional embodiments.

As shown in fig. 4A, the video playing method disclosed in this embodiment may include:

s401, selecting a target video attribute from candidate video attributes according to attribute information of the candidate video attributes in a target video; the candidate video attributes include at least one of: entity, lines, or barrage.

S402, determining the heat value of candidate speech information in the target video under the condition that the attribute of the target video is speech.

In one embodiment, each video frame of the target video is identified through an OCR technology or a voice recognition technology, candidate speech information included in the target video is determined, and a heat value of each candidate speech information is obtained through weighting calculation according to factors such as the number of times each candidate speech information is searched, the number of times each candidate speech information is browsed, the number of pages recorded, and the like.

S403, selecting target speech information to be displayed from the candidate speech information according to the heat value.

In one embodiment, the heat value is sorted from high to low in descending order according to the heat value of each candidate speech-line information, and the target speech-line information is determined according to the sorting result, for example, the candidate speech-line information corresponding to the first three heat values with the largest heat value is used as the target speech-line information. And displaying the target speech information and the playing time of the video frame to which the target speech information belongs together.

In another embodiment, the candidate speech-line information is matched with a preset classical speech-line, and the candidate speech-line information matched with any classical speech-line is used as the target speech-line information according to the matching result. Wherein, the classical speech is obtained according to the heat value of the speech.

S404, extracting video content to be played from the target video according to the target speech information.

In one implementation mode, a user selects at least one of the displayed target speech information as index speech information according to the displayed target speech information, and extracts video content to be played from the target video according to the index speech information selected by the user and the pre-established association relationship between the speech information and the video frames.

Fig. 4B is an interface schematic diagram of target speech information disclosed according to an embodiment of the present disclosure, as shown in fig. 4B, where 30, 31, and 32 respectively represent three target speech information, and a user may select at least one of the three target speech information as index speech information, and further extract video content to be played from a target video 23 according to the index speech information selected by the user and an association relationship between pre-established speech information and video frames.

According to the method and the device, under the condition that the attribute of the target video is the speech, the heat value of candidate speech information in the target video is determined, the target speech information to be displayed is selected from the candidate speech information according to the heat value, the video content to be played is extracted from the target video according to the target speech information, the effect that the playing progress is not required to be manually controlled by a user, the video content required by the user is automatically positioned and played is achieved, the video playing efficiency is improved, the effect that the user can meet the personalized film watching requirement is achieved by supporting the video skipping of the user based on the speech dimension, and the corresponding video frame is usually a wonderful video clip due to the fact that the target speech information is determined according to the heat value, and therefore the film watching experience of the user is improved.

On the basis of the above embodiment, before "determining the heat value of candidate speech-line information in the target video" in S402, the method further includes:

A) and under the condition that the target video has subtitles, carrying out optical character recognition on video frames included in the target video, and determining the candidate speech-line information included in the target video.

Specifically, under the condition that the target video has subtitles, according to the position coordinates of the subtitles in the target video, performing optical character recognition on the video content in the position coordinates, and determining candidate speech-line information included in the target video.

B) And under the condition that the target video does not have subtitles, performing voice recognition on video frames included in the target video, and determining the candidate speech-line information included in the target video.

Specifically, under the condition that the target video does not have subtitles, voice recognition is performed on video frames included in the target video, audio data of each video frame is converted into text information, and candidate speech-line information included in the target video is determined according to the text information obtained through the voice recognition.

Under the condition that the target video has subtitles, performing optical character recognition on video frames included in the target video to determine candidate speech information included in the target video; under the condition that the target video does not have the subtitles, voice recognition is carried out on video frames included by the target video, and candidate speech-line information included by the target video is determined, so that the candidate speech-line information of the target video can be determined whether the target video includes the subtitles or not, and the flexibility and the reliability of determination of the candidate speech-line information are improved.

Fig. 5 is a flowchart of a video playing method disclosed according to an embodiment of the present disclosure, which is suitable for a case where the number of target video attributes is at least two, and is further optimized and expanded based on the foregoing technical solution, and can be combined with the foregoing optional embodiments.

As shown in fig. 5, the video playing method disclosed in this embodiment may include:

s501, acquiring target attribute information of the candidate video attribute selected from the attribute information of any candidate video attribute in the target video.

In one embodiment, if a user wants to select at least two target video attributes from the candidate video attributes, at least one piece of target attribute information of any candidate video selected by the user first is obtained. For example, the user first selects attribute information "role a" and "role B" of a candidate video attribute "entity", and then takes "role a" and "role B" as target attribute information.

S502, according to the target attribute information of the candidate video attribute, screening the attribute information of other candidate video attributes except the candidate video attribute to obtain the residual attribute information of other candidate video attributes.

In one embodiment, according to the obtained target attribute information, screening attribute information of other candidate video attributes except the candidate video attribute, eliminating attribute information which conflicts with the target attribute information in other candidate video attributes, only retaining attribute information which has intersection with the target attribute information in other candidate video attributes as residual attribute information, and displaying the residual attribute information to a user.

For example, if the target attribute information is "role a" and "role B", and the attribute information of the "line" attribute includes "line 1", "line 2", "line 3", and "line 4", it is determined that "line 1" and "line 2" are lines that are spoken by "role C" according to the correspondence between lines and roles, and therefore it is determined that "line 1" and "line 2" are in conflict with "role a" and "role B", then "line 1" and "line 2" are removed, and "line 3" and "line 4" are taken as the remaining attribute information.

S503, determining the target video attribute and the target attribute information of the target video attribute according to the target attribute information of the candidate video attribute and the residual attribute information of other candidate video attributes.

In one embodiment, the user selects at least one piece of remaining attribute information from the remaining attribute information of at least one other candidate video attribute according to the remaining attribute information of the other candidate video attributes displayed, and uses the selected at least one other candidate video attribute as a secondary candidate video attribute and uses the selected at least one remaining attribute information as secondary attribute information according to the selection operation of the user. And then taking the candidate video attribute and the secondary candidate video attribute as target video attributes, and taking target attribute information and secondary attribute information of the candidate video attributes as target attribute information of the target video attributes.

For example, assuming that the user selects the attribute information "role a" and "role B" of the candidate video attribute "entity" for the first time, and the user selects the remaining attribute information "speech 3" and "speech 4" of the other candidate video attribute "speech" for the second time, the "entity" and the "speech" are taken together as the target video attribute, and the "role a", "role B", "speech 3", and "speech 4" are taken together as the target attribute information of the target video attribute.

S504, extracting video content to be played from the target video according to the target video attribute and the target attribute information of the target video attribute.

The method comprises the steps of obtaining target attribute information of a candidate video attribute selected from attribute information of any candidate video attribute in a target video, screening attribute information of other candidate video attributes except the candidate video attribute according to the target attribute information of the candidate video attribute to obtain residual attribute information of other candidate video attributes, determining the target attribute information of the target video attribute and the target attribute information of the target video attribute according to the target attribute information of the candidate video attribute and the residual attribute information of other candidate video attributes, ensuring that playing conflict cannot be generated between the target attribute information, avoiding that when a video frame corresponding to certain target attribute information is played, video frames corresponding to other target attribute information are suddenly jumped to play, and ensuring that video playing is discontinuous, and ensuring that the watching experience of a user is better.

Fig. 6 is a schematic structural diagram of a video playing apparatus according to an embodiment of the present disclosure, which may be applied to a case of performing skip playing on a target video. The device of the embodiment can be implemented by software and/or hardware, and can be integrated on any electronic equipment with computing capability.

As shown in fig. 6, the video playing apparatus 60 disclosed in this embodiment may include a target video attribute selection module 61 and a video content extraction module 62, where:

the target video attribute selection module 61 is configured to select a target video attribute from candidate video attributes according to attribute information of the candidate video attributes in a target video; the candidate video attributes include at least one of: entities, lines, or barrages;

and a video content extracting module 62, configured to extract video content to be played from the target video according to the target video attribute.

Optionally, the video content extracting module 62 is specifically configured to:

determining the occurrence frequency of candidate barrage information in the target video under the condition that the attribute of the target video is a barrage;

selecting target bullet screen information to be displayed from the candidate bullet screen information according to the occurrence times;

and extracting video content to be played from the target video according to the target bullet screen information.

Optionally, the video content extracting module 62 is further specifically configured to:

extracting a video frame comprising the target bullet screen information from the target video;

and generating the video content to be played according to the extracted video frame.

determining candidate entity information to be displayed in the target video under the condition that the target video attribute is an entity;

and selecting target entity information from the candidate entity information, and extracting video content to be played from the target video according to the target entity information.

determining the heat value of candidate speech information in the target video under the condition that the attribute of the target video is speech;

selecting target speech-line information to be displayed from the candidate speech-line information according to the heat value;

and extracting video content to be played from the target video according to the target speech information.

Optionally, the apparatus further includes a speech information determining module, specifically configured to:

under the condition that the target video has subtitles, carrying out optical character recognition on video frames included in the target video, and determining the candidate speech-line information included in the target video;

and under the condition that the target video does not have subtitles, performing voice recognition on video frames included in the target video, and determining the candidate speech-line information included in the target video.

Optionally, the target video attribute selecting module 61 is specifically configured to:

acquiring target attribute information of a candidate video attribute selected from attribute information of any candidate video attribute in a target video;

according to the target attribute information of the candidate video attribute, screening the attribute information of other candidate video attributes except the candidate video attribute to obtain the residual attribute information of other candidate video attributes;

and determining the target video attribute and the target attribute information of the target video attribute according to the target attribute information of the candidate video attribute and the residual attribute information of other candidate video attributes.

The video playing device 60 disclosed in the embodiment of the present disclosure can execute the video playing method disclosed in the embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method. Reference may be made to the description of any method embodiment of the disclosure for a matter not explicitly described in this embodiment.

In the technical scheme of the disclosure, the acquisition, storage, application and the like of the personal information of the related user all accord with the regulations of related laws and regulations, and do not violate the good customs of the public order.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 7 illustrates a schematic block diagram of an example electronic device 700 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 7, the device 700 comprises a computing unit 701, which may perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM)702 or a computer program loaded from a storage unit 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. The computing unit 701, the ROM 702, and the RAM 703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Various components in the device 700 are connected to the I/O interface 705, including: an input unit 706 such as a keyboard, a mouse, or the like; an output unit 707 such as various types of displays, speakers, and the like; a storage unit 708 such as a magnetic disk, optical disk, or the like; and a communication unit 709 such as a network card, modem, wireless communication transceiver, etc. The communication unit 709 allows the device 700 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

Computing unit 701 may be a variety of general purpose and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 701 executes the respective methods and processes described above, such as a video playback method. For example, in some embodiments, the video playback method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 708. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 700 via ROM 702 and/or communications unit 709. When the computer program is loaded into the RAM 703 and executed by the computing unit 701, one or more steps of the video playback method described above may be performed. Alternatively, in other embodiments, the computing unit 701 may be configured to perform the video playback method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), blockchain networks, and the internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A video playback method, comprising:

2. The method of claim 1, wherein the extracting video content to be played from the target video according to the target video attribute comprises:

3. The method of claim 2, wherein the extracting the video content to be played from the target video according to the target barrage information comprises:

4. The method of claim 1, wherein the extracting video content to be played from the target video according to the target video attribute comprises:

5. The method of claim 1, wherein the extracting video content to be played from the target video according to the target video attribute comprises:

6. The method of claim 5, wherein before determining the heat value of candidate speech-line information in the target video, further comprising:

7. The method according to any one of claims 1-6, wherein the selecting a target video attribute from the candidate video attributes according to attribute information of the candidate video attributes in the target video comprises:

8. A video playback apparatus comprising:

9. The apparatus of claim 8, wherein the video content extraction module is specifically configured to:

10. The apparatus of claim 9, wherein the video content extraction module is further specifically configured to:

11. The apparatus of claim 8, wherein the video content extraction module is further specifically configured to:

12. The apparatus of claim 8, wherein the video content extraction module is further specifically configured to:

13. The apparatus according to claim 12, further comprising a speech-line information determining module, specifically configured to:

14. The apparatus according to claims 8-13, wherein the target video property selection module is specifically configured to:

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-7.

17. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-7.