CN111209437B

CN111209437B - Label processing method and device, storage medium and electronic equipment

Info

Publication number: CN111209437B
Application number: CN202010030363.5A
Authority: CN
Inventors: 杨广煜
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-01-13
Filing date: 2020-01-13
Publication date: 2023-11-28
Anticipated expiration: 2040-01-13
Also published as: CN111209437A

Abstract

The embodiment of the application discloses a tag processing method, a device, a storage medium and electronic equipment, wherein the method comprises the following steps: displaying a video playing page corresponding to a target video in a video playing client, wherein the video playing page comprises a voice input control used for triggering generation of video labels and a label viewing control used for viewing the video labels, when voice input operation for the voice input control is detected, the video labels corresponding to the current video time are generated, the video labels comprise the current video time and video frame image information corresponding to the current video time, when label viewing operation for the label viewing control is detected, a video label list is displayed, and the video label list comprises at least one generated video label arranged according to a preset sequence. According to the scheme, the video label can be generated according to the operation of the user in the video playing process, so that the flexibility of label processing is improved.

Description

Label processing method and device, storage medium and electronic equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a tag processing method, a device, a storage medium, and an electronic apparatus.

Background

The electronic bookmark is a mark added at the reading interruption position when a reader stops reading the electronic book. Through the added electronic bookmark, when the reader opens the electronic book next time, the position of the last reading interruption can be conveniently and rapidly found according to the electronic bookmark, and reading can be continued from the position of the last interruption. In the prior art, readers can add electronic tags through manual operation at the places where the readers wish to add the electronic bookmarks, or the reading application can add tags at the places where the readers' reading is interrupted. However, in the prior art, the electronic bookmark can only be applied to the reading field, and the application range of generating the electronic bookmark is single.

Disclosure of Invention

The embodiment of the application provides a tag processing method, a tag processing device, a storage medium and electronic equipment, which can improve the flexibility of tag processing.

The embodiment of the application provides a label processing method, which comprises the following steps:

displaying a video playing page corresponding to a target video in a video playing client, wherein the video playing page comprises a voice input control used for triggering generation of a video tag and a tag viewing control used for viewing the video tag;

when voice input operation for the voice input control is detected, generating a video tag corresponding to the current video time, wherein the video tag comprises: the current video time and the video frame image information corresponding to the current video time;

When a tag viewing operation for the tag viewing control is detected, a video tag list is presented, the video tag list including at least one generated video tag arranged in a predetermined order.

Correspondingly, the embodiment of the application also provides a label processing device, which comprises:

the display module is used for displaying a video playing page corresponding to the target video in the video playing client, wherein the video playing page comprises a voice input control used for triggering generation of a video tag and a tag viewing control used for viewing the video tag;

the generating module is used for generating a video tag corresponding to the current video time when the voice input operation aiming at the voice input control is detected, and the video tag comprises: the current video time and the video frame image information corresponding to the current video time;

and the display module is used for displaying a video tag list when the tag viewing operation aiming at the tag viewing control is detected, wherein the video tag list comprises at least one generated video tag arranged according to a preset sequence.

Optionally, in some embodiments, the generating module may include an acquiring sub-module and a generating sub-module as follows:

The acquisition sub-module is used for acquiring video frame image information corresponding to the current video time, a video unit to be marked corresponding to the current video time and current recorded voice information when voice input operation aiming at the voice input control is detected;

and the generation sub-module is used for generating a video tag corresponding to the video unit to be marked based on the voice information and the video frame image information.

Optionally, in some embodiments, the acquiring sub-module may include a first acquiring sub-module and a second acquiring sub-module as follows:

the first acquisition sub-module is used for acquiring current video time, video frame image information corresponding to the current video time and voice information recorded by the current voice input control when voice input operation for the voice input control is detected;

and the second acquisition sub-module is used for acquiring the video unit to be marked corresponding to the current video time from the target video based on the voice information.

At this time, the third obtaining sub-module may be specifically configured to obtain, when it is detected that the voice information includes the first tag type information, a current video frame image corresponding to the current video time from the target video, and determine the current video frame image as a video unit to be marked.

At this time, the third obtaining sub-module may be specifically configured to determine an operation start time point and an operation end time point corresponding to the voice input operation when the voice information includes the second tag type information, obtain, based on the operation start time point and the operation end time point, a current video clip from the target video, and determine the current video clip as a video unit to be marked.

At this time, the generating sub-module may be specifically configured to, when detecting that the voice information includes tag content voice information, convert the tag content voice information into tag content text information, and generate a video tag corresponding to the video unit to be marked based on the tag content text information and the video frame image information.

At this time, the obtaining sub-module may be specifically configured to obtain, when detecting a voice input operation for the voice input control, a current video time and video frame image information corresponding to the current video time, close audio in the target video, and play the target video after closing the audio, to obtain the voice information currently recorded for the voice input control.

Optionally, in some embodiments, the tag processing apparatus may further include a first acquiring module and an arranging module, as follows:

the first acquisition module is used for acquiring video tags corresponding to a plurality of videos in a video set, wherein the video set comprises videos corresponding to a plurality of levels;

the arrangement module is used for arranging a plurality of video tags based on the hierarchy of the video set and the current video time corresponding to the video tags to obtain a video tag list.

Optionally, in some embodiments, the tag processing apparatus may further include a determining module, a second acquiring module, and a jumping module, as follows:

the determining module is used for determining target video time corresponding to the target video label and video to be played corresponding to the target video label when the skip play operation for the target video label in the video label list is detected;

the second acquisition module is used for acquiring a video clip to be played from the video to be played based on the target video time;

and the jump module is used for jumping and playing the video clip to be played.

At this time, the obtaining sub-module may be specifically configured to detect, when a voice input operation for the voice input control is detected, a login condition of a user for the video playing client to obtain user login status information, and when the user login status information determines that the user has logged in the video playing client, obtain video frame image information corresponding to a current video time, a video unit to be marked corresponding to the current video time, and currently recorded voice information.

In addition, the embodiment of the application also provides a computer storage medium, which stores a plurality of instructions, wherein the instructions are suitable for being loaded by a processor to execute the steps in any of the label processing methods provided by the embodiment of the application.

In addition, the embodiment of the application also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps in any label processing method provided by the embodiment of the application when executing the program.

The embodiment of the application can display a video playing page corresponding to a target video in a video playing client, wherein the video playing page comprises a voice input control used for triggering generation of a video tag and a tag viewing control used for viewing the video tag, when voice input operation aiming at the voice input control is detected, the video tag corresponding to the current video time is generated, and the video tag comprises: when the label checking operation aiming at the label checking control is detected, a video label list is displayed, wherein the video label list comprises at least one generated video label arranged according to a preset sequence. According to the scheme, the video label can be generated according to the operation of the user in the video playing process, so that the flexibility of label processing is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a label processing system according to an embodiment of the present application;

FIG. 2 is a first flowchart of a tag processing method according to an embodiment of the present application;

FIG. 3 is a second flowchart of a tag processing method according to an embodiment of the present application;

FIG. 4 is a third flowchart of a tag processing method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a first video tag list provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of video tag processing according to an embodiment of the present application;

FIG. 7 is a schematic diagram of determining a user login condition according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a second video tag list provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of a third video tag list provided by an embodiment of the present application;

fig. 10 is a schematic diagram of a video playing page according to an embodiment of the present application;

Fig. 11 is a schematic structural diagram of a label processing apparatus according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Referring to the drawings, wherein like reference numbers refer to like elements throughout, the principles of the present application are illustrated in an appropriate computing environment. The following description is based on illustrative embodiments of the application and should not be taken as limiting other embodiments of the application not described in detail herein.

In the description that follows, specific embodiments of the application will be described with reference to steps and symbols performed by one or more computers, unless otherwise indicated. Thus, these steps and operations will be referred to in several instances as being performed by a computer, which as referred to herein performs operations that include processing units by the computer that represent electronic signals that represent data in a structured form. This operation transforms the data or maintains it in place in the computer's memory system, which may reconfigure or otherwise alter the computer's operation in a manner well known to those skilled in the art. The data structure maintained by the data is the physical location of the memory, which has specific characteristics defined by the data format. However, the principles of the present application are described in the foregoing text and are not meant to be limiting, and one skilled in the art will recognize that various steps and operations described below may also be implemented in hardware.

The term "module" as used herein may be considered a software object executing on the computing system. The various components, modules, engines, and services described herein may be viewed as implementing objects on the computing system. The apparatus and methods described herein may be implemented in software, but may also be implemented in hardware, and are within the scope of the present application.

The terms "first," "second," and "third," etc. in this disclosure are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to the particular steps or modules listed and certain embodiments may include additional steps or modules not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The embodiment of the application provides a tag processing method, a tag processing device, a storage medium and electronic equipment. Specifically, the embodiment of the application provides a tag processing method suitable for electronic equipment. The electronic equipment can be equipment such as a terminal, and the terminal can be equipment such as a mobile phone, a tablet personal computer, a notebook computer, a personal computer, an intelligent television, a box and the like; the electronic device may be a server, which may be a single server or a server cluster formed by a plurality of servers.

For example, the tag processing means may be integrated in the terminal or the server.

In the embodiment of the application, the tag processing method can be independently executed by the terminal or the server, or can be jointly executed by the terminal and the server.

Referring to fig. 1, for example, an electronic device may be configured to display a video playing page corresponding to a target video in a video playing client, where the video playing page includes a voice input control for triggering generation of a video tag and a tag viewing control for viewing the video tag, and when a voice input operation for the voice input control is detected, the video tag corresponding to a current video time is generated, the video tag includes a current video time and video frame image information corresponding to the current video time, and when a tag viewing operation for the tag viewing control is detected, a video tag list is displayed, where the video tag list includes at least one generated video tag arranged in a predetermined order.

In another embodiment, a video playing page corresponding to the target video may be displayed on a video playing client installed on the terminal, and when a voice input operation of the user for the voice input control is detected, the currently recorded voice information is obtained and sent to the server. The server can generate a video tag corresponding to the current video time according to the received voice information, and return the video tag to the video playing client.

It will be appreciated that in another embodiment, the steps of the tag processing method may also be performed by a terminal, in which the tag processing apparatus may be integrated in the form of a video playing client, which may perform professional video editing or the like.

The following will describe in detail. The following description of the embodiments is not intended to limit the preferred embodiments.

The embodiment of the application will be described from the perspective of a tag processing device, which may be integrated in a terminal or in a server.

The method for processing the tag provided by the embodiment of the application can be executed by a processor of a terminal, and as shown in fig. 2, the specific flow of the method for processing the tag can be as follows:

201. And displaying a video playing page corresponding to the target video in the video playing client.

The video playing client is a user end capable of providing video playing service for users, and can be installed on a terminal and operated in cooperation with the server. The user can log in the video playing client through the user account, so that the video playing client can record data such as video playing history, video watching preference and the like of the user. The video playing client is not limited to a client that mainly plays video, and may be a client that includes a video playing function, for example, a video client, a browser client, and the like.

The target video may be a video currently played in the video playing client, for example, when the user is watching the video 1 through the video playing client, the video 1 may be determined as the target video. The embodiment of the application can not limit the video content, the video format and the like of the target video.

The video playing page is a page capable of playing video for a user, for example, as shown in fig. 10, a target video may be played in the video playing page, so that the user views the target video, and the video playing page may further include a voice input control for triggering to generate a video tag and a tag viewing control for viewing the video tag.

The video tag is an electronic bookmark capable of guiding a user to search for a specific video frame or video segment in the video. For example, when the user is watching the second episode of the television play 1 and plays to 20 minutes and 30 seconds of the second episode of the television play 1, the user adds a video tag, and then the user can directly utilize the video tag next time to jump to play the video clip starting from 20 minutes and 30 seconds of the second episode of the television play 1.

The user may also add tag content to the video tag, where the tag content may be an overview of the currently playing video content, a subtitle added to the currently playing video, or a viewing experience of the user, etc. For example, when the user is watching the second episode of the television play 1 and plays to the 20 th minute and 30 th second of the second episode of the television play 1, scenario 1 occurs, at this time, the user may add a video tag, where the video tag includes tag content "scenario 1"; when the video label list is viewed by the user, the label content 'scenario 1' corresponding to the 20 th minute and 30 th second of the second episode of the television play 1 and the label content 'scenario 2' corresponding to the 30 th minute and 30 th second of the second episode of the television play 1 can be displayed in the video label list, so that the user can intuitively know the scenario occurring in the video and the time when each scenario occurs.

The voice input control can be a control which is positioned in a video playing page and guides a user to input voice, and because the embodiment of the application generates the video label according to the voice information input by the user, the voice input control can also be used as a control for triggering a video label processing request. For example, as shown in fig. 10, the voice input control may be in the form of a button, and the user may trigger the video tag processing request and record the voice information by pressing the voice input control in the form of the button for a long time. The voice input control may take various forms, for example, the voice input control may take the form of an input box, a button, an icon, or the like.

The label viewing control may be a control located in a video playing page, and directs a user to view a video label. For example, as shown in fig. 10, the tag viewing control may be in the form of a button, and the user may trigger a request for displaying the video tag list by clicking on the tag viewing control in the form of the button, where the video tag list may be displayed on the terminal interface. The form of the label viewing control can be various, for example, the label viewing control can be in the form of an input box, a button, an icon and the like.

In practical applications, for example, as shown in fig. 10, user 1 may log in to the video playing client by using account 1, and watch the target video based on the video playing client, where the target video may be played in the video playing page of the video playing client. The video playing page can further comprise a voice input control used for triggering generation of the video label and a label viewing control used for viewing the video label.

In an embodiment, the video tag may not only be aimed at a certain moment in the target video, but also be aimed at a certain time period in the target video, for example, the video tag may be aimed at not only the 20 th minute and 30 th seconds in the third set of the television play 1, but also the 20 th minute to 30 th minutes in the third set of the television play 1, and at this time, according to the video tag, the video clip corresponding to the 20 th minute to 30 th minutes in the third set of the television play 1 may be jumped and broadcast.

202. When voice input operation for the voice input control is detected, a video tag corresponding to the current video time is generated.

The voice input operation may be an operation of inputting voice by the user with respect to the voice input control, for example, the user may press the voice input control in the form of a long button, and speak during the long button, at this time, the speech of the user is recorded and stored, and the operation of pressing the long button by the user may be determined as the voice input operation with respect to the voice input control. For another example, a user may press a voice input control in the form of a button for a long time, and play audio from the playback device during the long time pressing of the button, at which time the played audio is recorded and stored, and such a user's operation of pressing the button for a long time may also be determined as a voice input operation for the voice input control.

The construction of the video tag requires the current video time and the video frame image information corresponding to the current video time. For example, if the video tag is constructed when the user views the 20 th minute and 30 th seconds in the second Ji Di triple of the television series 1, the current video time corresponding to the video tag is the 20 th minute and 30 th seconds in the second Ji Di triple of the television series 1.

The video frame image information may be feature information corresponding to the target video of the current video time, for example, if the video tag is constructed when the user views the 20 th minute and 30 th seconds in the second Ji Di third set of the drama 1, the video frame image information may include a video title corresponding to the second Ji Di third set of the drama 1, a unique video identifier corresponding to the second Ji Di third set of the drama 1, the 20 th minute and 30 th seconds in the second Ji Di third set of the current video time, a timestamp, and so on.

When the user performs the voice input operation, the current video time may be the time corresponding to the target video in the video playing page, for example, the second Ji Di third set of the television play 1 is being played in the video playing page, and when the user plays to the 20 th minute and the 30 th second in the third set, the user performs the voice information input by pressing the voice input control in the form of the button for a long time, and at this time, the 20 th minute and the 30 th second in the second Ji Di third set of the television play 1 may be determined as the current video time.

In practical application, for example, when the 20 th minute and 30 th seconds in the second Ji Di third set of the television play 1 are played in the video playing page, a voice input control in the form of long-press button of the user is detected, and a bookmark is said, and at this time, a video label corresponding to the 20 th minute and 30 th seconds in the second Ji Di third set of the television play 1 can be generated.

In an embodiment, for example, after the video tag is generated, the user may be prompted that the tag has been successfully generated by means of a message prompt box, where the message prompt box may include "tag successfully generated". For another example, after the video tag is generated, the user may also be prompted by voice to indicate "tag successful" to indicate to the user that the video tag has been successfully manufactured.

In an embodiment, after the video tag is generated, the video tag may be further shared, for example, after the user 1 shares the video tag of the 20 th score of the video 1 to the user 2, the video playing client of the user 2 may prompt the user 2, and the user 1 shares the video tag corresponding to the 20 th score of the video 1 to the user 2.

In one embodiment, since the video tag needs to be constructed according to the input voice information, the current video time, and the video frame image information, the voice information, the current video time, and the video frame image information need to be acquired. Specifically, the step of generating a video tag corresponding to the current video time when the voice input operation for the voice input control is detected may include:

When voice input operation aiming at the voice input control is detected, acquiring video frame image information corresponding to the current video time and current recorded voice information;

and generating a video tag corresponding to the current video time based on the voice information and the video frame image information.

In practical applications, for example, when the 20 th minute and 30 th second in the second Ji Di third set of the television play 1 are played in the video playing page, a voice input control in the form of a long-pressed button by the user is detected, at this time, a video tag generation request is triggered, and then the user can say "bookmark" for the voice input control. The terminal may determine the 20 th minute and 30 th second in the third set of the drama 1 second Ji Di as the current video time according to the video tag generation request, and acquire video frame image information corresponding to the current video time, where the video frame image information may include one or more of a video title corresponding to the third set of the drama 1 second Ji Di, a unique video identifier corresponding to the third set of the drama 1 second Ji Di, and a 20 th minute and 30 th second and a timestamp in the third set of the current video time drama 1 second Ji Di. And acquiring voice information 'bookmarks' recorded by a user aiming at the voice input control, and then generating a video tag corresponding to the 20 th minute and the 30 th second in the second Ji Di third set of the television play 1 according to the voice information, the current video time and the video frame image information.

In an embodiment, after generating the video tag corresponding to the current video time in the target video according to the voice information, the current video time and the video frame image information, the information such as the video tag, the voice information, the current video time and the video frame image information may be stored in the database. The data structure of the video tag may include: the video ID of the target video, the video name of the target video, the video subtitle corresponding to the target video, the link of the target video image, the current video time, the total duration of the target video, the content information of the video tag, and so on.

The video subtitle may be the number of video sets in which the target video is located, for example, if the target video is the third set of the second season of the television play 1, the video subtitle may be the "third set".

The target video image may be a video image corresponding to a video tag in the target video, for example, if the video tag corresponds to the 20 th minute and 30 th seconds in the second Ji Di third set of the television show 1, the target video image may be a video frame image corresponding to the 20 th minute and 30 th seconds in the second Ji Di third set of the television show 1. For another example, if the video tag corresponds to the 20 th to 30 th points in the second Ji Di third set of the drama 1, the target video image may be a poster image corresponding to the second season of the drama 1, and so on.

In one embodiment, since the video tag is of a single user, it is necessary to determine whether the user has logged into the video playback client when creating the video tag. Specifically, the step of "when detecting the voice input operation for the voice input control, obtaining the video frame image information corresponding to the current video time and the currently recorded voice information" may include:

when voice input operation aiming at the voice input control is detected, detecting the login condition of a user aiming at the video playing client to obtain user login state information;

when the user login state information determines that the user has logged in the video playing client, video frame image information corresponding to the current video time and current recorded voice information are obtained.

In practical application, for example, when the 20 th minute and 30 th second in the second Ji Di third set of the television play 1 are played in the video playing page, a voice input control in the form of a long-press button of the user is detected, and at this time, a video tag generation request is triggered. As shown in fig. 7, the terminal may detect the login situation of the user according to the video tag generation request, and if it is detected that the user has logged in to the video playing client, it indicates that the condition for constructing the video tag is satisfied at this time, the step of acquiring the video frame image information and the voice information may be performed. If it is detected that the user does not log in the video playing client, it is indicated that the condition for constructing the video tag is not satisfied at this time, so that a login request can be sent to the user, the user can log in the video playing client by using the user account according to the login request, and then the video tag construction is performed.

In an embodiment, the video tag in the embodiment of the application can be used for not only a certain moment in the target video but also a certain time period in the target video, so that the types of the video tags can be distinguished according to the recorded voice information. Specifically, the step of "when detecting the voice input operation for the voice input control, obtaining the video frame image information corresponding to the current video time and the currently recorded voice information" may include:

when voice input operation aiming at the voice input control is detected, acquiring current video time and video frame image information corresponding to the current video time;

acquiring voice information recorded aiming at the voice input control currently;

acquiring a video unit to be marked corresponding to the current video time from the target video based on the voice information;

the step of generating the video tag corresponding to the current video time based on the voice information and the video frame image information includes:

and generating a video tag corresponding to the video unit to be marked based on the voice information and the video frame image information.

The video unit to be marked is a video unit marked by a video tag, and can be a frame of video frame in the target video or a section of video fragment in the target video. For example, the video unit to be marked may be a video frame corresponding to 20 th minute and 30 th second in the third set of the television play 1, or may be a video clip corresponding to 20 th minute to 22 nd minute in the third set of the television play 1.

In practical applications, for example, when the 20 th minute and 30 th second in the second Ji Di third set of the television play 1 are played in the video playing page, a voice input control in the form of a long-pressed button by the user is detected, at this time, a video tag generation request is triggered, and then the user can say "bookmark" for the voice input control. The terminal can determine the 20 th minute and the 30 th second in the second Ji Di third set of the television play 1 as the current video time according to the video tag generation request, and acquire video frame image information corresponding to the current video time. And acquiring voice information 'bookmarks' recorded by a user aiming at the voice input control, then acquiring video frames corresponding to the 20 th minute and the 30 th second from the second Ji Di third set of the television play 1 according to the voice information, serving as video units to be marked, and generating video labels corresponding to the video units to be marked according to the voice information, the current video time and the video frame image information.

In an embodiment, the video tag may correspond to a certain moment in the target video, and specifically, the step of "obtaining, based on the voice information, the video unit to be marked corresponding to the current video time from the target video" may include:

when the voice information is detected to comprise first tag type information, acquiring a current video frame image corresponding to the current video time from the target video;

and determining the current video frame image as a video unit to be marked.

The first tag type information may be information representing that a video tag to be constructed is a bookmark type, the video tag of the bookmark type corresponds to a certain moment in the target video, and a video unit to be marked corresponding to the video tag of the bookmark type is a frame of video frame in the target video. For example, in the embodiment of the present application, the first tag type information may be preset to be "bookmark", so when it is detected that the voice information includes "bookmark", it may be determined that the video tag of the bookmark type needs to be constructed at this time.

In practical applications, for example, the first tag type information may be preset to be "bookmark", and when the 20 th minute and 30 th second in the second Ji Di third set of the television play 1 are played in the video playing page, the voice input control in the form of long-pressed button by the user is detected, and the "bookmark" is said for the voice input control. At this time, since the voice information input by the user includes the "bookmark", it may be determined that the voice information includes the first tag type information, and the current video frame image corresponding to the 20 th minute and the 30 th second may be acquired from the second Ji Di third set of the television play 1, and the acquired current video frame image may be determined as the video unit to be marked.

In an embodiment, the video tag may further correspond to a certain period of time in the target video, and specifically, the step of "obtaining, based on the voice information, the video unit to be marked corresponding to the current video time from the target video" may include:

when the voice information is detected to comprise second tag type information, determining an operation starting time point and an operation ending time point corresponding to the voice input operation;

acquiring a current video clip from the target video based on the operation start time point and the operation end time point;

the current video segment is determined as a video unit to be marked.

The second tag type information may be information representing that the video tag to be constructed is of a short video type, the video tag of the short video type corresponds to a certain time period in the target video, and the video unit to be marked corresponding to the video tag of the short video type is a video clip in the target video. For example, in the embodiment of the present application, the second tag type information may be preset to be "short video building", so when it is detected that the voice information includes "short video building", it may be determined that a short video type video tag needs to be built at this time.

In practical applications, for example, the first tag type information may be preset to be "construct short video", when the 20 th minute and 30 th seconds in the second Ji Di third set of the television 1 are played in the video playing page, the voice input control in the form of long-pressed buttons of the user is detected, the "record short video" is pointed to by the voice input control, and then the long-pressed voice input control is stopped at the 22 nd minute and 30 th seconds. At this time, since the voice information input by the user includes "record short video", it may be determined that the voice information includes the second tag type information, and the 20 th minute and 30 th seconds of the second Ji Di third episode of the television series 1 may be determined as an operation start time point corresponding to the voice input operation, and the 22 nd minute and 30 th seconds of the second Ji Di third episode of the television series 1 may be determined as an operation end time point corresponding to the voice input operation. Then, a current video clip corresponding to the 20 th minute, 30 th second to the 22 nd minute, 30 th second may be acquired from the second Ji Di third set of the drama 1, and the acquired current video clip is determined as a video unit to be marked.

In an embodiment, the first tag type information and the second tag type information are not limited to "bookmarks" and "recording short videos", and the information content of the first tag type information and the second tag type information can be adjusted according to actual application requirements, so long as the types of the video tags to be constructed currently can be distinguished according to the first tag type information and the second tag type information.

In an embodiment, the user may also add content information to the video tag, so that the user can intuitively learn information that the user wishes to pay attention to according to the content information when viewing the video tag list. Specifically, the step of generating the video tag corresponding to the current video time based on the voice information and the video frame image information may include:

when the voice information is detected to comprise the tag content voice information, converting the tag content voice information into tag content text information;

and generating a video tag corresponding to the current video time based on the tag content text information and the video frame image information.

The tag content voice information may be voice information including a summary of the video content currently played, a subtitle added to the video currently played, or a viewing experience of the user, etc. For example, when the 20 th minute and 30 th seconds in the second Ji Di third set of the television play 1 are played in the video playing page, a voice input control in the form of a long-press button of the user is detected, and the "bookmark scenario 1" is described for the voice input control, at this time, "scenario 1" can be determined as tag content voice information, and represents that the 20 th minute and 30 th seconds in the second Ji Di third set of the television play 1 play content related to scenario 1.

In practical application, for example, when the 20 th minute and 30 th second in the second Ji Di three sets of the television play 1 are played in the video playing page, a voice input control in the form of a long-press button of a user is detected, and a "bookmark scenario 1" is described for the voice input control, at this time, "scenario 1" can be determined as tag content voice information, then the tag content voice information can be converted into tag content text information "scenario 1", and a video tag corresponding to the 20 th minute and 30 th second in the second Ji Di three sets of the television play 1 is generated according to the tag content text information and the acquired video frame image information, wherein the video tag comprises tag content "scenario 1".

For another example, when the 20 th minute and 30 th seconds in the second Ji Di third set of the television series 1 are played in the video playing page, the voice input control in the form of the long-press button of the user is detected, the "record short video scenario 1" is said for the voice input control, and then the long-press voice input control is stopped at the 22 nd minute and 30 th seconds. At this time, the "scenario 1" may be determined as tag content voice information, and then the tag content voice information may be converted into tag content text information "scenario 1", and video tags corresponding to 20 th minute, 30 th second to 22 nd minute, 30 th second in the second Ji Di triple of the television play 1 may be generated according to the tag content text information and the acquired video frame image information, where the video tags include the tag content "scenario 1".

In an embodiment, for example, when the 20 th minute and 30 th second in the second Ji Di three sets of the television play 1 are played in the video playing page, a voice input control in the form of a long press button of a user is detected, and a "bookmark viewing heart" is said for the voice input control, at this time, "viewing heart" can be determined as tag content voice information, then the tag content voice information can be converted into tag content text information "viewing heart", and a video tag corresponding to the 20 th minute and 30 th second in the second Ji Di three sets of the television play 1 is generated according to the tag content text information and the acquired video frame image information, and the video tag includes tag content "viewing heart". By the method, notes or viewing hearts recorded by a user in the process of viewing the video can be recorded, so that the user can know the notes or viewing feelings made before the user like reading novels when viewing the video tag list, and the like.

In an embodiment, since the embodiment of the application needs to identify the voice information recorded by the user and acquire the content included in the voice information, the influence on the voice information recorded by the user needs to be reduced as much as possible and the voice information as clear as possible is acquired. Specifically, the step of "when detecting the voice input operation for the voice input control, obtaining the video frame image information corresponding to the current video time and the currently recorded voice information" may include:

When voice input operation aiming at the voice input control is detected, acquiring video frame image information corresponding to the current video time;

closing the audio in the target video, and playing the target video after closing the audio;

and acquiring the current recorded voice information.

In practical applications, for example, when the 20 th minute and 30 th second in the second Ji Di third episode of the television series 1 are played in the video playing page, a voice input control in the form of a long press button by the user is detected, and a "bookmark" is spoken for the voice input control. Because the target video in the video playing page is still in a continuous playing state in the process of the user speaking the bookmark, the audio in the target video can be closed in the process of the user long-pressing the voice input control in the form of the button, so that the target video is in a silent state to continue playing, and when the user stops long-pressing the voice input control, the audio in the target video is recovered, and the target video with the audio is continuously played. By the method, the voice information with the least interference can be obtained, so that the accuracy of label generation is improved.

In an embodiment, in the process of long-pressing the voice input control by the user, the volume of the audio in the target video can be reduced, so that the target audio is in a low-volume state to continue playing, and when the long-pressing the voice input control by the user is stopped, the volume of the audio in the target video is restored, and the target video with the original audio volume is continuously played. On the one hand, the influence of the audio frequency in the target video on the voice information can be reduced, and on the other hand, the watching of the video by the user is not influenced.

In an embodiment, for example, when the voice input control is detected that the user presses the button for a long time, the playing of the target video can be paused at the same time, so that the user can record the voice information without interference, and when the voice information is detected to include a bookmark, the playing of the target video can be continued when the user stops pressing the voice input control for a long time, so that the video can be played automatically without additional operation of the user.

For example, when the voice input control is detected to be pressed by the user for a long time, the playing of the target video can be paused at the same time, so that the user can record voice information under the condition of no interference, when the voice information is detected to include 'record short video', the video can be played automatically, and when the user stops pressing the voice input control for a long time, the operation starting time point and the operation ending time point corresponding to the voice input operation are recorded. At this time, in the process that the user inputs the control by voice for a long time, the target video with the volume reduced or eliminated can be played, so that the user is not influenced to record voice information.

In an embodiment, due to uncertainty of the voice information, multiple detections of the voice information are required to improve accuracy of tag generation. For example, as shown in fig. 6, when the 20 th minute and 30 th second in the second Ji Di third episode of the television series 1 are played in the video play page, a voice input control in the form of a long press of a button by the user is detected and "bookmark" is spoken for the voice input control. At this time, the client voice service may upload an instruction to the server, where the instruction may include a voice information "bookmark", a current video time, and video frame image information corresponding to the current video time. After the server acquires the instruction, the validity of the voice information in the instruction can be judged, namely whether the voice information is invalid or not can be judged, if the voice information is detected to be invalid, the voice information can be fed back to the client for error reporting.

If the voice information in the instruction is detected to be effective, whether the instruction is a label generation instruction or not can be continuously judged, and if the instruction is not the label generation instruction, other voice instructions corresponding to the instruction can be executed. If the instruction is a label generation instruction, a label creation flow can be entered, and a video label can be generated. After the video tag is generated, the video tag, the current video time carried in the instruction and the video frame image information corresponding to the current video time can be stored in a database in the corresponding server of the user. And then the video tag is sent to the client, and the database of the client is updated according to the video tag, the current video time and the video frame image information corresponding to the current video time. After the label is stored, the user can be informed of the completion of the video bookmark making through a message prompt box or a voice prompt mode.

In an embodiment, after the video tag is generated, the video tag may also be uploaded to the cloud, so that multiple platforms are consistent. For example, the user 1 logs in to the video playing client 1 using the user account 1, and adds a video tag at the 20 th minute and 30 th seconds in the second Ji Di third episode of the television series 1. Then, when the user logs in the video playing client 2 by using the user account 1, the user can find the video tag corresponding to the 20 th minute and the 30 th second in the second Ji Di third set of the television 1 of the video playing client 2.

203. When a tag view operation for the tag view control is detected, a list of video tags is presented.

The label viewing operation may be an operation of a user to view a label with respect to a label viewing control, for example, the user may click on the label viewing control in a button form, at this time, it may be determined that the user needs to view a video label list, and the video label list is displayed for the user to view.

Wherein the list of video tags includes at least one generated video tag arranged in a predetermined order. The video tag list can be aimed at a certain television play or a movie, wherein a plurality of video tags can be arranged according to the sequence of episodes, namely, a plurality of video tags belonging to the same episode in one television play are arranged according to the time sequence, and a plurality of episodes in one television play are arranged and displayed according to the episode number. For example, the video tag list may be for episode 1 and include two video tags of a first episode and one video tag of a second episode in episode 1. Wherein, the two video tags in the first set may be arranged according to the time sequence corresponding to the video tags.

In practical applications, for example, as shown in fig. 9, when it is detected that a user clicks a tab view control in the form of a button in a video playing page, a video tab list may be displayed on the left side of the video playing page, where the video tab list includes two video tabs corresponding to a first set and one video tab corresponding to a second set in a television play 1, where the two video tabs in the first set may be arranged according to a time sequence corresponding to the video tabs.

In an embodiment, video tags belonging to a plurality of dramas of a series may also be simultaneously presented in a list of video tags. For example, as shown in fig. 5, the video tag list may further include two video tags corresponding to a first episode of the television series 1, one video tag corresponding to a second episode, and one video tag corresponding to the first episode of the television series 1 in a second season, where the two video tags in the first episode may be arranged according to the order of time corresponding to the video tags.

In an embodiment, the video tag list may be arranged according to a hierarchy of the video set, and in particular, the tag processing method may further include:

acquiring video tags corresponding to a plurality of videos in a video set, wherein the video set comprises videos corresponding to a plurality of levels;

and arranging a plurality of video tags based on the hierarchy of the video set and the current video time corresponding to the video tags to obtain a video tag list.

Wherein, the video set may be a set composed of a plurality of videos belonging to a series, for example, a plurality of episodes in a television series may compose a video set; for another example, multiple episodes in a series of multiple episodes may also constitute a video collection, and so on.

The hierarchy may be a standard for dividing videos in the video set, for example, the video set includes a first season 1-6 episode of the television play 1 and a second season 1-6 episode of the television play 1, and at this time, the video set may be divided into a plurality of videos corresponding to the first season of the television play 1 and a plurality of videos corresponding to the second season of the television play 1 according to the hierarchy of the seasons. The multiple videos corresponding to the first season of the television play 1 can be divided into 6 videos corresponding to 1-6 sets according to the level of the episode, the multiple videos corresponding to the second season of the television play 1 can be divided into 6 videos corresponding to 1-6 sets, and the like.

In practical applications, for example, as shown in fig. 5, the video set includes a first season 1-2 episode of a television play 1 and a second season 1 episode of the television play 1, 4 video tags corresponding to the video set may be obtained, then the video set is divided into a plurality of videos corresponding to the first season of the television play 1 and a plurality of videos corresponding to the second season of the television play 1 according to a hierarchy of the season, then the plurality of videos corresponding to the first season of the television play 1 are divided into 2 videos corresponding to the 1-2 episodes according to a hierarchy of the episode, the plurality of videos corresponding to the second season of the television play 1 are divided into 1 video corresponding to the 1 episode, and so on. And then, arranging the television dramas in different seasons according to the number of the seasons, arranging a plurality of videos belonging to the same season according to episodes, and arranging a plurality of video tags corresponding to each video according to a time sequence to obtain a video tag list.

In an embodiment, for example, as shown in fig. 5, when it is detected that the user clicks a tab view control in the form of a button in a video playing page, a video tab list may be displayed, where the video tab list includes 2 video tabs corresponding to a first episode of a television episode 1, 1 video tab corresponding to a second episode of the television episode, and 1 video tab corresponding to a second episode of the television episode 1, where each episode is followed by a total duration indicating the video of the episode, each video tab is followed by a video time indicating the video tab, and a tab content corresponding to the video tab, and each video tab corresponds to a target video image.

In an embodiment, since the update time span of the television drama is long, the user may not remember the content of the previous drama when seeing the following drama, and if the user needs to review, a large number of fast forward and fast backward operations are required, which is complicated to operate and inefficient. After the video tag list is displayed, the user can know the content of the previous episode according to tag contents corresponding to a plurality of video tags in the video tag list so as to achieve the purpose of episode review, and can also know the content focused by the user so as to provide more accurate content recommendation for the user. Therefore, for the video such as reasoning suspicion, the user can know the event clues and the development process of the event according to the label content corresponding to the video labels in the video label list.

In an embodiment, the user may further perform video skip play based on the video tag list, and specifically, the tag processing method may further include:

when detecting skip play operation for a target video tag in the video tag list, determining target video time corresponding to the target video tag and video to be played corresponding to the target video tag;

acquiring a video clip to be played from the video to be played based on the target video time;

and skipping to play the video clip to be played.

The skip play operation may be an operation of a user for skip playing of the video tag list, for example, the user may click on a video tag corresponding to the 20 th minute and 30 th second of the third set of the television play 1 in the video tag list, and at this time, skip playing may be started from the 20 th minute and 30 th second of the third set of the television play 1.

In practical applications, for example, as shown in fig. 8, after a video tag list is displayed, when a user clicks on a target video tag in the video tag list, it may be determined that the target video tag corresponds to 20 th minute and 30 th seconds in the second Ji Di third episode of the television play 1, and it is determined that a video clip to be played is a video clip beginning with 20 th minute and 30 th seconds in the second Ji Di third episode of the television play 1, and then, the video clip to be played is jumped to be played in a video playing page.

In an embodiment, for example, since the video tag may also be specific to a time slot in the target video, when the user clicks on the target video tag for the time slot in the video tag list, it may be determined that the target video tag corresponds to a video clip from 20 minutes 30 seconds to 22 minutes 30 seconds of the third episode of the television episode 1, and the video clip may be determined as a video clip to be played, and then, the video clip to be played is jumped to play in the video playing page.

In an embodiment, the method for processing the labels in the embodiment of the application is not limited to generating the video labels according to voice, but can also generate the video labels according to gestures. For example, when the 20 th minute and 30 th second in the second Ji Di third set of the television 1 is played in the video playing page, the terminal camera captures that the gesture of the user is a left hand waving, and at this time, a video tag corresponding to the 20 th minute and 30 th second in the second Ji Di third set of the television 1 can be generated.

The preset gestures for generating the video tag according to the gestures may be various, and the gesture of the user is not excessively limited in the embodiment of the present application, for example, the preset gestures may be waving a hand leftwards, waving a hand rightwards, and so on.

As can be seen from the above, the embodiment of the present application may display a video playing page corresponding to a target video in a video playing client, where the video playing page includes a voice input control for triggering generation of a video tag and a tag viewing control for viewing the video tag, and when a voice input operation for the voice input control is detected, the video tag corresponding to a current video time is generated, where the video tag includes: when the label checking operation aiming at the label checking control is detected, a video label list is displayed, wherein the video label list comprises at least one generated video label arranged according to a preset sequence. According to the scheme, the video label can be generated according to the operation of the user in the video playing process, so that the flexibility of label processing is improved. And the user can add tag content in the video tag, so that when the user views the video tag list, the user can clearly and intuitively know the content concerned by the user, the viewing experience of the user, the development change of the scenario and the like according to the added tag content, thereby facilitating the user to review the plot. Furthermore, the user can also utilize the video tag list to play the video in a jumping manner, and when the jumping play operation of the user is detected, the video play page can directly jump to play the video clips focused by the user. In addition, the video tag can be used for not only a certain moment in the video but also a certain time period in the video, so that a user can store a section of video clips by using the video tag, tag contents are added to the video clips, and the video tags are used for fast and convenient skip play.

The method according to the previous embodiment will be described in further detail below with the tag processing apparatus being integrated in an electronic device.

Referring to fig. 3, a specific flow of the tag processing method according to the embodiment of the present application may be as follows:

301. the electronic equipment displays a video playing page corresponding to the second set of the television drama 1 in the video playing client.

In practical applications, for example, as shown in fig. 10, user 1 may log in to the video playing client by using account 1, and watch the second episode of the television play 1 based on the video playing client, where the second episode of the television play 1 may be played in a video playing page of the video playing client. The video playing page can further comprise a voice input control used for triggering generation of the video label and a label viewing control used for viewing the video label.

302. When detecting that the user presses the voice input button for a long time, the electronic device detects the login condition of the user.

In practical applications, for example, when the 20 th minute and 30 th second in the second set of the drama 1 are played in the video playing page, the video tag generation request is triggered when the user is detected to press the voice input button for a long time. As shown in fig. 7, the electronic device may detect the login situation of the user according to the video tag generation request, and if it is detected that the user has logged in to the video playing client, it indicates that the condition for constructing the video tag is satisfied at this time, and then the steps of the subsequent tag processing method may be performed.

If the user is detected not to log in the video playing client, the condition that the video label is constructed is not met, so that a login request can be sent to the user, the user can log in the video playing client by using a user account according to the login request, and then the steps of a subsequent label processing method are carried out.

303. When a user logs in a video playing client, the electronic equipment acquires video frame image information corresponding to the current video time and the current recorded voice information.

In practical application, for example, when it is detected that the user has logged in to the video playing client, the 20 th minute and 30 th second in the second set of the television play 1 may be determined as the current video time, and video frame image information corresponding to the current video time is obtained, where the video frame image information may include one or more of a video title "television play 1" corresponding to the television play 1, a secondary video title "second set" corresponding to the second set of the television play 1, a unique video identifier corresponding to the second set of the television play 1, and a timestamp. And acquires voice information recorded by the user for the voice input button.

After obtaining the video frame image information corresponding to the current video time and the current recorded voice information, the client voice service can upload an instruction to the server, where the instruction can include the video frame image information corresponding to the current video time and the current recorded voice information. After the server acquires the instruction, the validity of the voice information in the instruction can be judged, namely whether the voice information is invalid or not can be judged, if the voice information is detected to be invalid, the voice information can be fed back to the client for error reporting. If the voice information in the instruction is detected to be effective, whether the instruction is a label generation instruction or not can be continuously judged, and if the instruction is not the label generation instruction, other voice instructions corresponding to the instruction can be executed. If the instruction is a tag generation instruction, a subsequent video tag generation step can be entered.

In an embodiment, because the video in the video playing page is still in a continuous playing state when the user records the voice information for the voice input button, the audio in the video playing page can be closed in the process that the user presses the voice input button for a long time, so that the video in the video playing page is in a silent state to continue playing, and when the user stops pressing the voice input button for a long time, the audio in the video is recovered, and the video with the audio is continuously played.

In an embodiment, in the process that the user presses the voice input button for a long time, the volume of the audio in the video can be reduced, so that the audio is in a low volume state to continue playing, and when the user stops pressing the voice input button for a long time, the volume of the audio in the video is restored, and the video with the original audio volume is continuously played.

304. When the voice information comprises a bookmark, the electronic equipment acquires a video unit to be marked corresponding to the 20 th minute and the 30 th second from the second set of the television play 1.

In practical applications, for example, after the voice information recorded by the user is obtained, the voice information can be detected, and when the voice information is detected to include a bookmark, it is indicated that the user needs to construct a bookmark type video tag. The electronic device may obtain a video frame corresponding to the 20 th minute and the 30 th second from the second set of the television play 1, and use the video frame as the video unit to be marked.

305. Based on the voice information and the video frame image information, the electronic equipment generates a video tag corresponding to the video unit to be marked.

In practical applications, for example, when the voice information is detected to include a "bookmark", a bookmark-type video tag may be constructed, and when the voice information is detected to further include a "scenario 1", it is indicated that the user needs to add the tag content "scenario 1" to the video tag, and a video tag including the tag content "scenario 1" may be constructed, where the video tag corresponds to the 20 th minute and 30 th seconds in the second set of the television play 1.

In an embodiment, for example, after the video tag is generated, the user may be prompted that the video tag has been successfully generated by means of a message prompt box, where the message prompt box may include "label generation success". For another example, after the video tag is generated, the user may also be prompted by voice to indicate "tag successful" to indicate to the user that the video tag has been successfully manufactured.

In an embodiment, after the video tag is generated, the video tag may be further shared, for example, after the user 1 shares the video tag of the second episode 20 th minute and 30 th second to the user 2, the video playing client of the user 2 may prompt the user 2, and the user 1 shares the video tag corresponding to the second episode 20 th minute and 30 th second to the user 2.

In an embodiment, after the video tag is generated, the information such as the video tag, the parallel voice information, the current video time, and the video frame image information may be stored in the database. The data structure of the video tag may include: video ID, video name, video subtitle, links to target video images, current video time, total duration of target video, content information of video tags, etc.

306. When the user is detected to click a tag viewing button, the electronic device displays a list of video tags.

In practical applications, for example, as shown in fig. 5, when it is detected that a user clicks a tab view button in a video playing page, a video tab list may be displayed, where the video tab list includes 2 video tabs corresponding to a first set of a television show 1, 1 video tab corresponding to a second set, and 1 video tab corresponding to a second set Ji Di of the television show 1, where each episode is followed by indicating a total duration corresponding to the video of the episode, each video tab is followed by indicating a video time corresponding to the video tab, and a tab content corresponding to the video tab, and each video tab is corresponding to a target video image.

In an embodiment, for example, as shown in fig. 8, when it is detected that the user clicks the tag view button in the video playing page, a video tag list may be displayed on the left side of the video playing page, where the video tag list includes 2 video tags corresponding to the first episode of the television series 1, 1 video tag corresponding to the second episode of the television series 1, and 1 video tag corresponding to the second episode of the television series Ji Di, where each video tag is later marked with a video time corresponding to the video tag and a tag content corresponding to the video tag.

307. When the fact that the user clicks the area corresponding to the target video tag in the video tag list is detected, the electronic equipment jumps to play the video clip to be played.

In practical applications, for example, after the video tag list is displayed, when it is detected that the user clicks the area corresponding to the target video tag in the video tag list, it may be determined that the target video tag corresponds to the 20 th minute and 30 th second in the second set of the television show 1, and it is determined that the video clip to be played is a video clip starting with the 20 th minute and 30 th second in the second set of the television show 1, and then the video clip to be played is jumped to play in the video playing page.

As can be seen from the foregoing, in the embodiment of the present application, the video playing page corresponding to the second set of the drama 1 in the video playing client may be displayed by the electronic device, when it is detected that the user presses the voice input button for a long time, the login condition of the user is detected, when the user has logged in the video playing client, the video frame image information corresponding to the current video time and the currently recorded voice information are obtained, when the voice information includes the "bookmark", the video unit to be marked corresponding to the 20 th minute and the 30 th seconds is obtained from the second set of the drama 1, based on the voice information and the video frame image information, the video tag corresponding to the video unit to be marked is generated, when it is detected that the user clicks the tag viewing button, the video tag list is displayed, and when it is detected that the user clicks the region corresponding to the target video tag in the video tag list, the video clip to be played is jumped. According to the scheme, the video label can be generated according to the operation of the user in the video playing process, so that the flexibility of label processing is improved. And the user can add tag content in the video tag, so that when the user views the video tag list, the user can clearly and intuitively know the content concerned by the user, the viewing experience of the user, the development change of the scenario and the like according to the added tag content, thereby facilitating the user to review the plot. Furthermore, the user can also utilize the video tag list to play the video in a jumping manner, and when the jumping play operation of the user is detected, the video play page can directly jump to play the video clips focused by the user. In addition, the video tag can be used for not only a certain moment in the video but also a certain time period in the video, so that a user can store a section of video clips by using the video tag, tag contents are added to the video clips, and the video tags are used for fast and convenient skip play.

Referring to fig. 4, a specific flow of the tag processing method according to the embodiment of the present application may be as follows:

401. the electronic equipment displays a video playing page corresponding to the second set of the television drama 1 in the video playing client.

In practical applications, specific steps for displaying a video playing page have been described, and will not be described herein.

402. When detecting that the user presses the voice input button for a long time, the electronic device detects the login condition of the user.

In practical applications, for example, when the 20 th minute and 30 th second in the second set of the drama 1 are played in the video playing page, it is detected that the user presses the voice input button for a long time, and voice information is recorded, and then when the user stops pressing the voice input button for a long time at the 22 nd minute and 30 th second, a video tag generation request is triggered. As shown in fig. 7, the electronic device may detect the login situation of the user according to the video tag generation request, and if it is detected that the user has logged in to the video playing client, it indicates that the condition for constructing the video tag is satisfied at this time, and then the steps of the subsequent tag processing method may be performed.

403. When a user logs in a video playing client, the electronic equipment acquires video frame image information corresponding to the current video time and the current recorded voice information.

After obtaining the video frame image information corresponding to the current video time and the current recorded voice information, the client voice service can upload an instruction to the server, where the instruction can include the video frame image information corresponding to the current video time and the current recorded voice information. After the server acquires the instruction, the validity of the voice information in the instruction can be judged, namely whether the voice information is invalid or not can be judged, if the voice information is detected to be invalid, the voice information can be fed back to the client for error reporting. If the voice information in the instruction is detected to be effective, whether the instruction is a label generation instruction or not can be continuously judged, and if the instruction is not the label generation instruction, other voice instructions corresponding to the instruction can be executed. If the instruction is a tag generation instruction, the method can enter a step of subsequent video tag processing.

404. When the voice information comprises 'recorded short video', the electronic equipment acquires video units to be marked corresponding to the 20 th minute, the 30 th second and the 22 nd minute, the 30 th second from the second set of the television play 1.

In practical applications, for example, after the voice information recorded by the user is obtained, the voice information can be detected, and when the voice information is detected to include "recording short video", it is indicated that the user needs to construct a short video type video tag. The electronic device may obtain video clips corresponding to the 20 th minute, 30 th second to the 22 nd minute, 30 th second from the second set of the television play 1, and use the video clips as video units to be marked.

405. Based on the voice information and the video frame image information, the electronic equipment generates a video tag corresponding to the video unit to be marked.

In practical applications, for example, when the voice information is detected to include "record short video", a short video type video tag may be constructed, and when the voice information is detected to further include "scenario 1", it is indicated that the user needs to add tag content "scenario 1" to the video tag, and a video tag including tag content "scenario 1" may be constructed, where the video tag corresponds to the 20 th minute 30 seconds to the 22 nd minute 30 seconds in the second set of drama 1.

In an embodiment, after the video tag is generated, the video tag may be further shared, for example, after the user 1 shares the video tag of 20 th minute, 30 th second to 22 th minute, 30 th second in the second set of the television play 1 to the user 2, the video playing client of the user 2 may prompt the user 2, and the user 1 shares the video tag corresponding to 20 th minute, 30 th second to 22 nd minute, 30 th second in the second set of the television play 1 to the user 2.

In an embodiment, after the video tag is generated, the video tag may also be uploaded to the cloud, so that multiple platforms are consistent. For example, the user 1 logs in to the video playing client 1 using the user account 1, and adds video tags in the 20 th minute 30 seconds to the 22 nd minute 30 seconds in the second Ji Di third episode of the television series 1. Then, when the user logs in the video playing client 2 by using the user account 1, the user can find out the video tags corresponding to the 20 th minute, 30 th second to the 22 nd minute, 30 th second in the second Ji Di third set of the television 1 of the video playing client 2.

406. When the user is detected to click a tag viewing button, the electronic device displays a list of video tags.

407. When the fact that the user clicks the area corresponding to the target video tag in the video tag list is detected, the electronic equipment jumps to play the video clip to be played.

In practical applications, for example, after the video tag list is displayed, when it is detected that the user clicks the area corresponding to the target video tag in the video tag list, it may be determined that the target video tag corresponds to the 20 th minute 30 seconds to the 22 th minute 30 seconds of the second episode of the television 1, and it is determined that the video clip to be played is the video clip of the 20 th minute 30 seconds to the 22 th minute 30 seconds of the second episode of the television 1, and then the video clip to be played is jumped to play in the video playing page.

As can be seen from the foregoing, in the embodiment of the present application, a video playing page corresponding to a second set of a drama 1 in a video playing client may be displayed by an electronic device, when a long pressing of a voice input button by a user is detected, a login condition of the user is detected, when the user has logged in the video playing client, video frame image information corresponding to a current video time and currently recorded voice information are obtained, when the voice information includes "recording short video", a video unit to be marked corresponding to a 20 th minute, a 30 th second to a 22 nd minute, a 30 th second is obtained from the second set of the drama 1, a video tag corresponding to the video unit to be marked is generated based on the voice information and the video frame image information, and when a user is detected to click a tag viewing button, a video tag list is displayed, and when a user is detected to click a region corresponding to a target video tag in the video tag list, a video clip is jumped and played. According to the scheme, the video label can be generated according to the operation of the user in the video playing process, so that the flexibility of label processing is improved. And the user can add tag content in the video tag, so that when the user views the video tag list, the user can clearly and intuitively know the content concerned by the user, the viewing experience of the user, the development change of the scenario and the like according to the added tag content, thereby facilitating the user to review the plot. Furthermore, the user can also utilize the video tag list to play the video in a jumping manner, and when the jumping play operation of the user is detected, the video play page can directly jump to play the video clips focused by the user. In addition, the video tag can be used for not only a certain moment in the video but also a certain time period in the video, so that a user can store a section of video clips by using the video tag, tag contents are added to the video clips, and the video tags are used for fast and convenient skip play.

In order to better implement the above method, correspondingly, the embodiment of the present application further provides a tag processing apparatus, which may be integrated in an electronic device, and referring to fig. 11, the tag processing apparatus includes a display module 111, a generating module 112, and a displaying module 113, as follows:

the display module 111 is configured to display a video playing page corresponding to a target video in the video playing client, where the video playing page includes a voice input control for triggering generation of a video tag and a tag viewing control for viewing the video tag;

the generating module 112 is configured to generate, when a voice input operation for the voice input control is detected, a video tag corresponding to a current video time, where the video tag includes: the current video time and the video frame image information corresponding to the current video time;

and a display module 113, configured to display a video tag list when a tag viewing operation for the tag viewing control is detected, where the video tag list includes at least one generated video tag arranged in a predetermined order.

In an embodiment, the generating module 112 may include an obtaining submodule 1121 and a generating submodule 1122 as follows:

The obtaining submodule 1121 is used for obtaining video frame image information corresponding to the current video time, a video unit to be marked corresponding to the current video time and current recorded voice information when voice input operation for the voice input control is detected;

and a generating sub-module 1122, configured to generate a video tag corresponding to the video unit to be marked based on the voice information and the video frame image information.

In an embodiment, the acquisition submodule 1121 may include a first acquisition submodule 11211 and a second acquisition submodule 11212, as follows:

the first obtaining submodule 11211 is configured to obtain, when a voice input operation for the voice input control is detected, a current video time, video frame image information corresponding to the current video time, and voice information currently recorded for the voice input control;

and a second obtaining submodule 11212, configured to obtain, from the target video, a video unit to be marked corresponding to the current video time based on the voice information.

In an embodiment, the third acquiring submodule 11213 may be specifically configured to:

And determining the current video frame image as a video unit to be marked.

the current video segment is determined as a video unit to be marked.

In an embodiment, the generating submodule 1122 may be specifically configured to:

and generating a video tag corresponding to the video unit to be marked based on the tag content text information and the video frame image information.

In an embodiment, the obtaining submodule 1121 may be specifically configured to:

and acquiring voice information recorded aiming at the voice input control.

In an embodiment, the tag processing apparatus may further include a first obtaining module 114 and an arranging module 115, as follows:

a first obtaining module 114, configured to obtain video tags corresponding to a plurality of videos in a video set, where the video set includes videos corresponding to a plurality of levels;

the arrangement module 115 is configured to arrange a plurality of video tags based on the hierarchy of the video set and the current video time corresponding to the video tag, so as to obtain a video tag list.

In an embodiment, the tag processing apparatus may further include a determining module 116, a second acquiring module 117, and a jumping module 118, as follows:

a determining module 116, configured to determine, when a skip play operation for a target video tag in the video tag list is detected, a target video time corresponding to the target video tag and a video to be played corresponding to the target video tag;

a second obtaining module 117, configured to obtain a video clip to be played from the video to be played based on the target video time;

A skip module 118, configured to skip playing the video clip to be played.

when the user login state information determines that the user has logged in the video playing client, video frame image information corresponding to the current video time, a video unit to be marked corresponding to the current video time and current recorded voice information are obtained.

In the implementation, each unit may be implemented as an independent entity, or may be implemented as the same entity or several entities in any combination, and the implementation of each unit may be referred to the foregoing method embodiment, which is not described herein again.

As can be seen from the above, in the embodiment of the present application, the video playing page corresponding to the target video in the video playing client may be displayed by the display module 111, the video playing page includes a voice input control for triggering to generate a video tag, and a tag viewing control for viewing the video tag, when a voice input operation for the voice input control is detected, the video tag corresponding to the current video time is generated by the generation module 112, where the video tag includes: when a tag viewing operation for the tag viewing control is detected, a video tag list including at least one generated video tag arranged in a predetermined order is displayed through the display module 113. According to the scheme, the video label can be generated according to the operation of the user in the video playing process, so that the flexibility of label processing is improved. And the user can add tag content in the video tag, so that when the user views the video tag list, the user can clearly and intuitively know the content concerned by the user, the viewing experience of the user, the development change of the scenario and the like according to the added tag content, thereby facilitating the user to review the plot. Furthermore, the user can also utilize the video tag list to play the video in a jumping manner, and when the jumping play operation of the user is detected, the video play page can directly jump to play the video clips focused by the user. In addition, the video tag can be used for not only a certain moment in the video but also a certain time period in the video, so that a user can store a section of video clips by using the video tag, tag contents are added to the video clips, and the video tags are used for fast and convenient skip play.

The embodiment of the application also provides electronic equipment which can integrate any label processing device provided by the embodiment of the application.

For example, as shown in fig. 12, a schematic structural diagram of an electronic device according to an embodiment of the present application is shown, specifically:

the electronic device may include one or more processing cores 'processors 121, one or more computer-readable storage media's memory 122, power supply 123, and input unit 124, among other components. Those skilled in the art will appreciate that the electronic device structure shown in fig. 12 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or may be arranged in different components. Wherein:

the processor 121 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 122 and calling data stored in the memory 122, thereby performing overall processing of the electronic device. Optionally, processor 121 may include one or more processing cores; preferably, the processor 121 may integrate an application processor and a modem processor, wherein the application processor primarily handles an operating system, a user interface, an application program, etc., and the modem processor primarily handles wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 121.

The memory 122 may be used to store software programs and modules, and the processor 121 performs various functional applications and data processing by running the software programs and modules stored in the memory 122. The memory 122 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs required for at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 122 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 122 may also include a memory controller to provide access to the memory 122 by the processor 121.

The electronic device further comprises a power supply 123 for powering the various components, preferably the power supply 123 is logically connected to the processor 121 via a power management system, whereby the functions of managing charging, discharging, and power consumption are performed by the power management system. The power supply 123 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The electronic device may further comprise an input unit 124, which input unit 124 may be used for receiving input digital or character information and for generating keyboard, mouse, joystick, optical or trackball signal inputs in connection with user settings and function control.

Although not shown, the electronic device may further include a display unit or the like, which is not described herein. In particular, in this embodiment, the processor 121 in the electronic device loads executable files corresponding to the processes of one or more application programs into the memory 122 according to the following instructions, and the processor 121 executes the application programs stored in the memory 122, so as to implement various functions as follows:

displaying a video playing page corresponding to a target video in a video playing client, wherein the video playing page comprises a voice input control used for triggering generation of a video tag and a tag viewing control used for viewing the video tag, when voice input operation aiming at the voice input control is detected, the video tag corresponding to the current video time is generated, and the video tag comprises: when the label checking operation aiming at the label checking control is detected, a video label list is displayed, wherein the video label list comprises at least one generated video label arranged according to a preset sequence.

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

To this end, an embodiment of the present application provides an electronic device, in which a plurality of instructions are stored, where the instructions can be loaded by a processor to perform steps in any of the tag processing methods provided in the embodiments of the present application. For example, the instructions may perform the steps of:

Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

The steps in any of the tag processing methods provided in the embodiments of the present application may be executed by the instructions stored in the storage medium, so that the beneficial effects that any of the tag processing methods provided in the embodiments of the present application may be achieved are detailed in the previous embodiments, and are not described herein.

The foregoing describes in detail a tag processing method, apparatus, storage medium and electronic device provided in the embodiments of the present application, and specific examples are applied to illustrate the principles and embodiments of the present application, where the foregoing examples are only used to help understand the method and core idea of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, the present description should not be construed as limiting the present application.

Claims

1. A tag processing method, comprising:

when voice input operation for the voice input control is detected, generating a video tag corresponding to the current video time, wherein the video tag comprises the current video time and video frame image information corresponding to the current video time;

sorting the video tags according to the current video time to generate a video tag list;

when a label viewing operation aiming at the label viewing control is detected, a video label list is displayed, wherein the video label list comprises at least one generated video label arranged according to a preset sequence, and video time corresponding to the video label and label content corresponding to the video label are marked after each video label in the video label list and correspond to a target video image;

when voice input operation for the voice input control is detected, generating a video tag corresponding to the current video time, including:

When voice input operation for the voice input control is detected, acquiring current video time, video frame image information corresponding to the current video time and voice information recorded for the voice input control currently, and acquiring a video unit to be marked corresponding to the current video time from the target video based on the voice information if the voice information comprises type information of a video tag to be constructed, wherein the video unit to be marked corresponds to the type information;

2. The tag processing method according to claim 1, wherein acquiring the video unit to be tagged corresponding to the current video time from the target video based on the voice information, comprises:

when the type information included in the voice information is detected to be first tag type information, acquiring a current video frame image corresponding to the current video time from the target video;

and determining the current video frame image as a video unit to be marked.

3. The tag processing method according to claim 1, wherein acquiring the video unit to be tagged corresponding to the current video time from the target video based on the voice information, comprises:

When the type information included in the voice information is detected to be the second tag type information, determining an operation starting time point and an operation ending time point corresponding to the voice input operation;

the current video segment is determined as a video unit to be marked.

4. The tag processing method according to claim 1, wherein generating a video tag corresponding to the video unit to be marked based on the voice information and the video frame image information, comprises:

5. The tag processing method according to claim 1, wherein when a voice input operation for the voice input control is detected, acquiring a current video time, video frame image information corresponding to the current video time, and voice information currently recorded for the voice input control includes:

and acquiring voice information recorded aiming at the voice input control.

6. The tag processing method according to claim 1, characterized in that the method further comprises:

7. The tag processing method according to claim 1, characterized in that the method further comprises:

And skipping to play the video clip to be played.

8. The tag processing method according to claim 1, wherein when a voice input operation for the voice input control is detected, acquiring a current video time, video frame image information corresponding to the current video time, and voice information currently recorded for the voice input control includes:

when the user login state information determines that the user has logged in the video playing client, acquiring current video time, video frame image information corresponding to the current video time and voice information recorded by the voice input control.

9. A label processing apparatus, comprising:

The generation module is used for generating a video tag corresponding to the current video time when the voice input operation for the voice input control is detected, wherein the video tag comprises the current video time and video frame image information corresponding to the current video time; sorting the video tags according to the current video time to generate a video tag list;

the generating module is specifically configured to, when detecting a voice input operation for the voice input control, obtain a current video time, video frame image information corresponding to the current video time, and voice information currently recorded for the voice input control, and if the voice information includes type information of a video tag to be constructed, obtain, based on the voice information, a video unit to be marked corresponding to the current video time from the target video, where the video unit to be marked corresponds to the type information; generating a video tag corresponding to the video unit to be marked based on the voice information and the video frame image information;

the display module is used for displaying a video tag list when the tag viewing operation aiming at the tag viewing control is detected, wherein the video tag list comprises at least one generated video tag arranged according to a preset sequence, and the video time corresponding to the video tag and the tag content corresponding to the video tag are marked after each video tag in the video tag list and correspond to a target video image.

10. A computer storage medium having stored thereon a computer program which, when run on a computer, causes the computer to perform the tag processing method according to any of claims 1-8.

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1 to 8 when the program is executed.