CN115103213B

CN115103213B - Information processing method, apparatus, device and computer readable storage medium

Info

Publication number: CN115103213B
Application number: CN202210659240.7A
Authority: CN
Inventors: 蒋杰; 殷杰; 胥本海; 魏婷; 陈笑怡; 黄舒婷; 陈丽丽; 马颖颖; 曹程博; 田昌勇; 贺凤香
Original assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd; MIGU Video Technology Co Ltd
Current assignee: Migu Cultural Technology Co Ltd; China Mobile Communications Group Co Ltd; MIGU Video Technology Co Ltd
Priority date: 2022-06-10
Filing date: 2022-06-10
Publication date: 2023-10-17
Anticipated expiration: 2042-06-10
Also published as: CN115103213A

Abstract

The invention discloses an information processing method, an information processing device, information processing equipment and a computer readable storage medium, wherein the method comprises the following steps: acquiring first audio and video information corresponding to a first target video and currently played; determining a first scene type corresponding to the first audio and video information, and acquiring a first scene application model corresponding to the first scene type; inputting the first audio and video information into a first scene application model, and taking the output of the first scene application model as a first text label corresponding to the first audio and video information; and sending the first text label and the first timestamp corresponding to the first audio and video information to the user terminal, wherein the user terminal displays the first text label and the first timestamp. According to the method and the device for searching the video content, the text labels and the corresponding time stamps are displayed on the user terminal, so that when a user searches the video clips of the live broadcast content, the video content to be searched can be accurately positioned according to the time stamps and the text labels, the time consumption of the video content is reduced, and the video searching efficiency is improved.

Description

Information processing method, apparatus, device and computer readable storage medium

Technical Field

The present invention relates to the field of information processing technologies, and in particular, to an information processing method, an information processing device, an information processing apparatus, and a computer readable storage medium.

Background

The current user selects the played live content to play the required specific content mainly in a time retrieval mode or a gradual approach mode according to the relevance of video content.

When a user needs to locate live broadcast content which wants to watch, the user needs to know related information of the current live broadcast content to quickly locate, otherwise, content retrieval can only be performed through watching of video content, and when the user obtains the live broadcast content which wants to watch, complicated operation is needed, so that the time consumption is long when the user searches for video fragments of the live broadcast content.

Disclosure of Invention

The invention mainly aims to provide an information processing method, an information processing device, information processing equipment and a computer readable storage medium, and aims to solve the problem that a user consumes long time when searching for video clips of live content.

In order to achieve the above object, the present invention provides an information processing method applied to a server, the information processing method including the steps of:

acquiring first audio and video information corresponding to a first target video and currently played;

determining a first scene type corresponding to the first audio and video information, and acquiring a first scene application model corresponding to the first scene type;

inputting the first audio and video information into the first scene application model for model training to obtain a first text label corresponding to the first audio and video information, wherein the first text label is used for describing the audio and video content corresponding to the first audio and video information;

and sending the first text label and a first timestamp corresponding to the first audio and video information to a user terminal, wherein the user terminal displays the first text label and the first timestamp.

Further, the step of sending the first text label and the first timestamp corresponding to the audio/video information to a user terminal, where the step of displaying the first text label and the first timestamp by the user terminal includes:

if a second text label corresponding to the target video exists before the current moment, determining a display interval based on the first time stamp and the second time stamp corresponding to the second text label;

And sending the display interval, the first text label and the first timestamp corresponding to the audio and video information to a user terminal, wherein the user terminal displays the first text label and the first timestamp based on the display interval.

Further, if there is a second text label corresponding to the target video before the current time, determining the display interval based on the first timestamp and the second timestamp corresponding to the second text label includes:

and if the difference value between the first time stamp and the second time stamp is larger than a set threshold value, determining that the display interval is a segment interval, wherein the display interval comprises a segment interval and a standard interval, and the segment interval is larger than the standard interval.

Further, the information processing method includes the steps of:

receiving a third text label and a third timestamp corresponding to a second target video, wherein the third text label is generated by a server based on second audio/video information corresponding to the third text label through a corresponding second scene application model, the second scene application model is acquired by the server based on a scene type corresponding to the second audio/video information, and the second audio/video information is acquired by the server based on a second target video which is currently played;

And displaying the third text label and the third timestamp.

Further, the step of displaying the third text label and the third timestamp includes:

acquiring a second scene type of the third text label, and acquiring a first triggering frequency of an operation instruction corresponding to the second scene type;

and if the first triggering times reach the preset times, displaying the third text labels and the third time stamps based on preset display parameters.

Further, the information processing method further includes:

when detecting an operation instruction corresponding to a text label currently displayed by the user terminal, acquiring a three-scene type of the text label corresponding to the operation instruction;

determining a second triggering frequency of the operation instruction corresponding to the third scene type;

and if the second triggering times reach the preset times, adjusting the text label currently displayed based on the preset display parameters.

Further, the information processing method further includes:

when a playback operation instruction corresponding to a text label currently displayed by the user terminal is detected, a fourth text label corresponding to the playback operation instruction is obtained;

determining playback starting time based on the time stamp corresponding to the fourth text label;

And executing the playback operation of the live content based on the playback start time.

In addition, in order to achieve the above object, the present invention also provides an information processing apparatus including:

the acquisition module is used for acquiring the first audio/video information which corresponds to the first target video and is currently played;

the determining module is used for determining a first scene type corresponding to the first audio and video information through a preset algorithm and acquiring a first scene application model corresponding to the first scene type;

the training module is used for inputting the first audio and video information into the first scene application model, and taking the output of the first scene application model as a first text label corresponding to the first audio and video information, wherein the first text label is used for describing the audio and video content corresponding to the first audio and video information;

and the sending module is used for sending the first text label and the first timestamp corresponding to the audio and video information to the user terminal, wherein the user terminal displays the first text label and the first timestamp.

Further, to achieve the above object, the present invention also provides an information processing apparatus including: the information processing device comprises a memory, a processor and an information processing program stored in the memory and capable of running on the processor, wherein the information processing program realizes the steps of the information processing method when being executed by the processor.

In addition, in order to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon an information processing program which, when executed by a processor, realizes steps of implementing the foregoing information processing method.

The method comprises the steps of obtaining first audio and video information corresponding to a first target video and currently played; then determining a first scene type corresponding to the first audio and video information, and acquiring a first scene application model corresponding to the first scene type; inputting the first audio and video information into the first scene application model for model training to obtain a first text label corresponding to the first audio and video information, wherein the first text label is used for describing the audio and video content corresponding to the first audio and video information; and then the first text label and the first timestamp corresponding to the first audio and video information are sent to the user terminal, wherein the user terminal displays the first text label and the first timestamp, can generate the corresponding text label according to the first audio and video information, and displays the text label and the corresponding timestamp on the user terminal, so that when searching the video fragment of the live broadcast content, the user can accurately position the video content to be searched according to the timestamp and the text label, the time consumption of the video content is reduced, the searching efficiency of the video is improved, and the user experience is further improved.

Drawings

FIG. 1 is a schematic diagram of an information processing apparatus in a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart of a first embodiment of an information processing method according to the present invention;

FIG. 3 is a flowchart of a third embodiment of an information processing method according to the present invention;

fig. 4 is a schematic functional block diagram of an embodiment of an information processing apparatus according to the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

FIG. 1 is a schematic diagram of an information processing apparatus in a hardware operating environment according to an embodiment of the present invention; the information processing device in the embodiment of the invention can be a PC or terminal equipment such as a smart phone.

As shown in fig. 1, the information processing apparatus may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

Optionally, the information processing device may further include a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and the like. Of course, the information processing apparatus may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor, and the like, which will not be described herein.

It will be appreciated by those skilled in the art that the terminal structure shown in fig. 1 does not constitute a limitation of the information processing apparatus, and may include more or less components than illustrated, or may combine certain components, or may be a different arrangement of components.

As shown in fig. 1, an operating system, a network communication module, a user interface module, and an information processing program may be included in the memory 1005, which is one type of computer storage medium.

In the information processing apparatus shown in fig. 1, the network interface 1004 is mainly used for connecting to a background server, and performing data communication with the background server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be used to invoke the information processing program stored in the memory 1005.

In the present embodiment, an information processing apparatus includes: the information processing device comprises a memory 1005, a processor 1001 and an information processing program stored in the memory 1005 and capable of running on the processor 1001, wherein the processor 1001 calls the information processing program stored in the memory 1005 and executes the steps of the information processing method in the following embodiments.

The invention also provides an information processing method which is applied to the server of the live broadcast content, and referring to fig. 2, fig. 2 is a flow diagram of a first embodiment of the information processing method of the invention.

In this embodiment, the information processing method is applied to a server, and includes the following steps:

step S101, obtaining first audio/video information corresponding to a first target video and played currently;

in this embodiment, the server acquires first audio/video information corresponding to the first target video, where the first audio/video information may be an image frame of the first target video, may be voice information corresponding to the first target video, or may be bullet screen information sent by a viewer at the current moment corresponding to the first target video. The first target video may be a live video.

The server is used for sending the first target video to each user terminal in real time, the user terminal plays the first target video, namely the first audio/video information can be a video clip of the first target video currently played by the user terminal within a preset duration, when the user terminal receives the barrage information, the barrage information is sent to the server, the server sends the barrage information to each user terminal, each user terminal plays the barrage information in a rolling mode in a playing interface of the first target video, and the first audio/video information further comprises the barrage information within the current time or the preset duration.

Step S102, determining a first scene type corresponding to the first audio and video information, and acquiring a first scene application model corresponding to the first scene type;

in this embodiment, the server determines, according to the first audio and video information, a preset first scene type matching the first audio and video information through a preset algorithm, where the preset algorithm may be an AI algorithm, and the scene type of the first audio and video information may be accurately identified by using an image frame or voice of the first audio and video information through the AI algorithm, where the first scene type may include a training scene, a synthetic scene, an armed scene, an ancient dress scene, and the like. Specifically, the server identifies an image frame of the first video information, and if the image frame of the first audio-video information includes an image frame related to a highlight fight, the server determines that the first scene type is a armed scene; or the server identifies the voice information in the first audio-video information, and if the voice information comprises voices related to fighting, martial arts and the like, the server determines that the first scene type is a martial arts scene.

Then, the server acquires a first scene application model corresponding to the first scene type, specifically, preset mapping relations between each scene type and the corresponding scene application model, and after acquiring the first scene type, the server inquires the mapping relations through the first scene type to determine the first scene application model.

It should be noted that, the scene application model corresponding to each scene type is a pre-trained model, and the model training can be accurately performed on the audio and video information through the scene application model to obtain the text label corresponding to the scene type.

Step S103, inputting the first audio and video information into the first scene application model, and taking the output of the first scene application model as a first text label corresponding to the first audio and video information, wherein the first text label is used for describing the audio and video content corresponding to the first audio and video information;

in this embodiment, when the server obtains the first scene application model, the first audio and video information is input into the first scene application model, the output of the first scene application model is used as a first text label corresponding to the first audio and video information, specifically, the first audio and video information includes an image frame of a first target video, voice information and bullet screen information, the first scene application model identifies the first audio and video information to obtain text information corresponding to the first text label, the text information includes text information corresponding to the image frame in the current live broadcast information, text information corresponding to the voice information in the current live broadcast information and/or text information corresponding to bullet screen information in the current live broadcast information, that is, the text information includes one or more of text information corresponding to the image frame, text information corresponding to the voice information and text information corresponding to bullet screen information, and then the first scene application model generates the first text label according to the text information, for example, the first scene application model generates the first text label based on keywords in the text information, so that the first text label includes the names of characters, events, etc. the keywords of the text label includes text information.

Step S104, sending the first text label and a first timestamp corresponding to the first audio/video information to a user terminal, where the user terminal displays the first text label and the first timestamp.

In this embodiment, the server associates a first text label with a first timestamp corresponding to the first audio/video information, and sends the first text label and the first timestamp to each user terminal, where the user terminals are all user terminals currently playing the first target video, and the user terminals display the first text label and the first timestamp. Specifically, a display interface of a first text label is arranged on a playing interface of a first target video of the user terminal, the user terminal displays the first text label and a first time stamp on the display interface, the display interface comprises a function button, a user only needs to click the function button to trigger a function, the user terminal can obtain the first text label from a server and display the first text label, or the user terminal directly displays the first text label and the first time stamp when receiving the first text label and the first time stamp.

According to the method and the device, the first scene application model corresponding to the first scene type is obtained, the first text label is obtained through the first scene application model, so that the association relation between the text label and the scene type is established, compared with the text label generated through the modes of text recognition, voice recognition and the like, the text label corresponding to the audio and video information can be directly obtained through the scene application model, the convenience and the efficiency for obtaining the text label are improved, meanwhile, the association relation between the text label and the scene type is established, a user of the user terminal can quickly inquire the text label of the same type according to the scene type, and user experience is improved.

According to the embodiment, the first audio/video information corresponding to the first target video and played currently is obtained; then determining a first scene type corresponding to the first audio and video information, and acquiring a first scene application model corresponding to the first scene type; inputting the first audio and video information into the first scene application model, and taking the output of the first scene application model as a first text label corresponding to the first audio and video information, wherein the first text label is used for describing the audio and video content corresponding to the first audio and video information; and then the first text label and the first timestamp corresponding to the first audio and video information are sent to the user terminal, wherein the user terminal displays the first text label and the first timestamp, can generate the corresponding text label according to the first audio and video information, and displays the text label and the corresponding timestamp on the user terminal, so that when searching the video fragment of the live broadcast content, the user can accurately position the video content to be searched according to the timestamp and the text label, the time consumption of the video content is reduced, the searching efficiency of the video is improved, and the user experience is further improved.

Based on the first embodiment, a second embodiment of the information processing method of the present invention is proposed, in which step S104 includes:

step S201, if there is a second text label corresponding to the target video before the current time, determining a display interval between the first text label and the second text label based on the first time stamp and a second time stamp corresponding to the second text label;

step S202, sending the display interval, the first text label and the first timestamp corresponding to the audio/video information to a user terminal, where the user terminal displays the first text label and the first timestamp based on the display interval.

In this embodiment, when the first text label and the first timestamp are obtained, the server determines whether a second text label exists before the current moment, where the second text label is a text label generated by the server in the current playing process of the first target video, if a plurality of text labels have been generated in the current playing process of the first target video, the second text label is a last generated text label in the generated text labels in the current playing process, that is, the second text label is a text label with the smallest time difference between the generated text label and the current moment in the current playing process, and if the second text label exists before the current moment, a display interval is determined based on the second timestamps corresponding to the first timestamp and the second text label, where the display interval includes a segment interval and a standard interval, the segment interval is a preset interval in the server, the segment interval is greater than the standard interval, and the display interval is a interval between the first text label and the second text label generated subsequently in the display plane of the user terminal.

When the server obtains the display interval, the first text label and the first timestamp corresponding to the audio/video information are sent to the user terminal, each user terminal receives the display interval sent by the server, the first text label and the first timestamp, and based on the display interval, the first text label and the first timestamp are displayed, that is, the interval between the first text label and the previous text label is the display interval, the content of the first text label is the first text information corresponding to the first text label, specifically, when the first text label and the first timestamp are displayed, for example, the first timestamp is located in the front, the first text label is located in the rear, or the first text label is displayed in a row, for example, the first timestamp is displayed first, the first text label is located in the lower row of the first timestamp, wherein the server sends the first text label in the time sequence of the timestamp corresponding to the first text label, specifically, when two first text labels exist, the first text label is 2 hours 10 minutes 36 seconds, 2 hours 10 minutes 37 minutes, and then the first text label is sent first 10 minutes 10 seconds, and then the first text label is sent in a row.

Further, in an embodiment, step 202 includes determining the display interval as a segment interval if the difference between the first timestamp and the second timestamp is greater than a set threshold, where the display interval includes a segment interval and a standard interval, and the segment interval is greater than the standard interval.

In this embodiment, the server detects whether a second text label exists before the current time, if the second text label exists before the current time, further calculates a difference value between the first timestamp and a second timestamp corresponding to the second text label, where the difference value is the first timestamp-the second timestamp, if the difference value is greater than a set threshold, determines that the display interval is a segment interval, where the segment interval is greater than the standard interval, the segment interval may be set to a multiple of the standard interval, for example, the segment interval is 1.5 times, 2 times, etc. of the standard interval, and the set threshold may be set to 1 second, etc. Specifically, after the first text label is determined, the time stamp of the first text label is 2 hours, 10 minutes and 38 seconds, the time stamp of the second text label is 2 hours, 10 minutes and 36 seconds, the difference value between the time stamp of the first text label and the time stamp of the second text label is 2 seconds, and is greater than a set threshold value for 1 second, the first text label is displayed based on the segment spacing, namely the space between the first text label and the second text label is the segment spacing, the segment spacing can be accurately determined according to the time interval before the text labels, the time interval between the text labels is embodied through the display spacing between the text labels, a user is facilitated to find video clips of live broadcast contents, and user experience is improved.

It should be noted that, if the second text label does not exist before the current time or the difference value is smaller than or equal to the set threshold value, the display interval is the standard interval.

In this embodiment, if a second text label corresponding to the target video exists before the current time, the display interval is determined based on the first timestamp and the second timestamp corresponding to the second text label; and then the display interval, the first text label and the first timestamp corresponding to the audio and video information are sent to the user terminal, wherein the user terminal displays the first text label and the first timestamp based on the display interval, the display interval can be determined according to the first text label and the second text label, the first text label is displayed by the user terminal based on the display interval, the time interval between the text labels is embodied by the display interval between the text labels, the user can conveniently find out the video segment of the live broadcast content, and the user experience is improved.

The present invention also provides an information processing method applied to a user terminal, referring to fig. 3, in a third embodiment of the information processing method of the present invention, the information processing method includes:

S301, receiving a third text label and a third timestamp corresponding to a second target video, wherein the third text label is generated by a server based on second audio/video information corresponding to the third text label through a corresponding second scene application model, the second scene application model is acquired by the server based on a scene type corresponding to the second audio/video information, and the second audio/video information is acquired by the server based on a second target video which is currently played;

s302, displaying the third text label and the third time stamp.

In this embodiment, the server acquires second audio/video information corresponding to the second target video, where the second audio/video information may be an image frame of the second target video, may be voice information corresponding to the second target video, or may be bullet screen information sent by the audience at the current moment corresponding to the second target video. The second target video may be a live video. The server is configured to send a second target video to each user terminal in real time, the user terminal plays the second target video, that is, the second audio/video information may be a video segment of the second target video currently played by the user terminal within a preset duration, when the user terminal receives the barrage information, the barrage information is sent to the server, the server sends the barrage information to each user terminal, each user terminal scrolls in a playing interface of the second target video to play the barrage information, and the second audio/video information further includes the barrage information within the current moment or the preset duration.

And the server determines a second scene type preset by matching with the second audio and video information through a preset algorithm according to the second audio and video information, wherein the second scene type can comprise a training scene, a variety scene, a armed scene, an ancient dress scene and the like. Then, the server acquires a second scene application model corresponding to the second scene type, specifically, the mapping relation between each scene type and the corresponding scene application model is preset, and after the second scene type is acquired, the server queries the mapping relation through the second scene type to determine the second scene application model. Inputting the second audio-video information into the second scene application model, taking the output of the second scene application model as a third text label corresponding to the video information, specifically, the second audio-video information comprises an image frame, voice information and bullet screen information of a second target video, the second scene application model identifies the second audio-video information to obtain text information corresponding to the third text label, the text information comprises text information corresponding to the image frame in the current live broadcast information, text information corresponding to the voice information in the current live broadcast information and/or text information corresponding to bullet screen information in the current live broadcast information, that is, the text information comprises one or more of text information corresponding to the image frame, text information corresponding to the voice information and text information corresponding to the bullet screen information, then the second scene application model generates the third text label according to the text information, for example, the second scene application model generates the third text label based on keywords in the text information, so that the third text label comprises the names, event names and other keywords of the text information, and finally, the third text label and the text information corresponding to the third audio-video terminal are sent to the user terminal.

Each user terminal receives the third text label and the third time stamp corresponding to the second target video, namely, the user terminal currently playing the second target video can receive the third text label and the third time stamp corresponding to the second target video, and the third text label and the third time stamp are displayed.

When the server generates the third text label, the server determines whether a fourth text label exists before the current time (the time of acquiring the second video information corresponding to the third text label), where the fourth text label is a text label generated by the server in the current playing process of the second target video, and if a plurality of text labels are generated in the current playing process of the second target video, the fourth text label is a last generated text label in the generated text labels in the current playing process, that is, the fourth text label is a text label with the smallest time difference between the generated text label in the current playing process and the current time. If the fourth text label exists, a display interval between the third text label and the fourth text label is determined based on the third time stamp and the fourth time stamp corresponding to the fourth text label, specifically, the server calculates a time stamp difference value between the third time stamp and the fourth time stamp, the time stamp difference value is a third time stamp-fourth time stamp, if the time stamp difference value is greater than a set threshold value, the display interval is determined to be a segment interval, the server sends the segment interval, the third text label and the third time stamp corresponding to the second audio and video information to a user terminal, the user terminal displays the third text label and the third time stamp on a display interface based on the segment interval, namely, the interval between the third text label and the fourth text label is the segment interval, if the time stamp difference value is less than or equal to the set threshold value, the server sends the third text label and the third time stamp corresponding to the second audio and video information to the user terminal, the user terminal adopts a standard interval to display the third text label and the third text label on the interface, further, the user terminal can display the text label and the user terminal can display the text label according to the segment interval before the segment interval, the user terminal can display the text label by the live text label, and the content of the text label can be displayed accurately, and the content of the text label can be displayed.

The user terminal is provided with a display interface of a third text label corresponding to the live broadcast content, and the third text label is displayed through the display interface, and specifically, second text information and a third time stamp of the third text label are displayed. The display interface can comprise a function button, a user can trigger a function by clicking the function button, and corresponding operation is carried out on live broadcast content through the text label.

According to the embodiment, a third text label and a third timestamp corresponding to the second target video are received, wherein the third text label is generated by a server based on second audio/video information corresponding to the third text label through a corresponding second scene application model, the second scene application model is obtained by the server based on a scene type corresponding to the second audio/video information, and the second audio/video information is obtained by the server based on the currently played second target video; and then displaying the third text label and the third time stamp, and displaying the text label and the corresponding time stamp on the user terminal, so that a user can accurately position the video content to be searched according to the time stamp and the text label when searching the video fragment of the live broadcast content, the time consumption of the video content is reduced, the searching efficiency of the video is improved, and further the user experience is improved.

Based on the third embodiment, a fourth embodiment of the information processing method of the present invention is proposed, in the present embodiment, the information processing method further includes:

step S501, acquiring a second scene type of the third text label, and acquiring a first trigger number of an operation instruction corresponding to the second scene type;

step S502, if the first trigger number reaches a preset number, displaying the third text label and the third timestamp based on a preset display parameter.

In this embodiment, the server obtains the second scene type of the third text label, obtains the first trigger times of the operation instruction corresponding to the second scene type, when the click operation of the third text label is detected, the function selection buttons corresponding to the third text label are displayed, the function selection buttons may include a playback button, a copy button, a cancel button and other function buttons, the playback button may include a playback start and a playback end, and the user may operate the third text label by clicking a certain button, the user terminal determines the currently detected operation instruction by clicking the function button corresponding to the detected click operation, and accumulates the trigger times of the scene type corresponding to the third text label, so that the first trigger times of the operation instruction corresponding to the second scene type can be obtained, that is, the first trigger times are times of the user selecting the third text label of the same second scene type.

And then judging whether the first triggering times reach preset times, and if the first triggering times reach the preset times, displaying the third text labels and the third time stamps based on preset display parameters, wherein the preset display parameters can be set to highlight, change color, thicken display and the like. Specifically, if the font color of the third text label is black, not thickened and not highlighted, the first trigger frequency is 3, the preset frequency is 2, the server acquires that the second scene type of the third text label is a armed scene, acquires the first trigger frequency of the operation instruction corresponding to the armed scene, and judges that the first trigger frequency is 3 times or more than the preset frequency for 2 times, the server performs thickening and highlighting on the font of the third text label and/or the third timestamp, and performs reddening treatment.

According to the embodiment, the second scene type of the third text label is obtained, the first triggering times of the operation instruction corresponding to the second scene type are obtained, then if the first triggering times reach the preset times, the third text label and the third timestamp are displayed based on the preset display parameters, and the difference display of the text labels is realized according to the operation times of the text labels through the preset display parameters, so that a user can quickly locate live broadcast content needing to be played back when searching video fragments of the live broadcast content, the time consumption for searching the live broadcast content is further reduced, the searching efficiency of the played back content is improved, and further the user experience is improved.

Based on the third embodiment, a fifth embodiment of the information processing method of the present invention is proposed, in the present embodiment, the information processing method further includes:

s501, when an operation instruction corresponding to a text label currently displayed by the user terminal is detected, acquiring a third scene type of the text label corresponding to the operation instruction;

s502, determining a second triggering frequency of an operation instruction corresponding to the third scene type;

and S503, if the second triggering times reach the preset times, adjusting the currently displayed text labels based on preset display parameters.

In this embodiment, the user terminal detects, in real time, an operation instruction corresponding to a text label currently displayed by the user terminal, where the operation instruction may include functions such as playback start, playback end, copy, cancel, and the like. When detecting an operation instruction corresponding to a text label currently displayed by the user terminal, the user terminal acquires a third scene type of the text label corresponding to the operation instruction, then acquires a third trigger frequency of the operation instruction corresponding to the third scene type, specifically, the user terminal acquires a duration trigger frequency of the operation instruction corresponding to the third scene type, and the duration trigger frequency +1 is used for acquiring the second trigger frequency.

And then, the server judges whether the second triggering times reach preset times, if the second triggering times reach the preset times, the currently displayed text labels are adjusted according to preset display parameters, specifically, the user terminal determines that the text labels to be processed corresponding to the third scene type in the currently displayed text labels, namely, the text labels to be processed are text labels with the scene type being the third scene type in the currently displayed text labels, and adjusts display parameters of the text labels to be processed according to preset display parameters, for example, the preset display parameters can set highlighting, color changing, thickening display and the like, the server detects that an operation instruction corresponding to the text labels currently displayed by the user terminal is playback start, acquires the third scene of the text labels corresponding to the playback start as the armed scene, and if the triggering times of the user for the playback start of the armed scene is greater than the preset times for 2 times, the fonts of all the text labels in the currently displayed text labels are thickened, highlighted and red-colored.

According to the method, when the operation instruction corresponding to the text label currently displayed by the user terminal is detected, the third scene type of the text label corresponding to the operation instruction is obtained, then the second trigger times of the operation instruction corresponding to the third scene type are determined, then if the second trigger times reach the preset times, the text label currently displayed is adjusted based on the preset display parameters, the text label can be displayed in a distinguishing mode, when a user searches for a video fragment of live broadcast content, live broadcast content needing to be played back can be positioned quickly, time consumption for searching for the live broadcast content is further reduced, searching efficiency of the played back content is improved, and user experience is further improved.

A sixth embodiment of the information processing method of the present invention is presented based on any one of the third to fifth embodiments, and in the present embodiment, the information processing method further includes:

s701, when a playback operation instruction corresponding to a text label currently displayed by the user terminal is detected, a fourth text label corresponding to the playback operation instruction is obtained;

s702, determining playback starting time based on the timestamp corresponding to the fourth text label;

s703, based on the playback start time, executing a playback operation of the live content.

Specifically, when a clicking operation of the fourth text label currently displayed is detected, a function selection button corresponding to the fourth text label is displayed, wherein the function selection button can comprise a playback button, a copy button, a cancel button and other function buttons, the playback button can comprise a playback start button, a playback end button and the like, a user can operate the fourth text label by clicking one of the buttons, and when a playback operation instruction corresponding to the fourth text label currently displayed by the user terminal is detected, the fourth text label corresponding to the playback operation instruction is obtained.

Next, a playback start time is determined based on the timestamp corresponding to the fourth text label, that is, the timestamp corresponding to the fourth text label is used as the playback start time of the playback operation of the live content, and the playback operation of the live content is executed based on the playback start time. Preferably, the playback operation instruction may be triggered by two text labels in the currently displayed text labels, that is, the fourth text label may include two text labels, the timestamp corresponding to the fourth text label also includes two timestamps, the user terminal may use one of the timestamps as the playback start time and the other timestamp as the playback end time, further, the playback start time and the playback end time may be determined based on the timestamp corresponding to the fourth text label, and then, the playback operation of the live content is performed based on the playback start time and the playback end time.

If the operation instruction is a copy instruction, the user terminal determines a text label corresponding to the copy instruction, copies and caches the text label corresponding to the copy instruction, and uses the text label for other functions; and if the operation instruction is a cancel instruction, canceling the display of the function selection button.

In this embodiment, when a playback operation instruction corresponding to a text label currently displayed by the user terminal is detected, a fourth text label corresponding to the playback operation instruction is obtained, then a playback start time and a playback end time are determined based on a time stamp corresponding to the fourth text label, and then playback operation of the live content is performed based on the playback start time and the playback end time, so that the live content can be played back according to the time stamp corresponding to the fourth text label, the live content to be played back can be quickly located through the text label, time for searching the live content is further reduced, searching efficiency of the playback content is improved, and user experience is further improved.

The present invention also provides an information processing apparatus, referring to fig. 4, comprising:

the acquisition module 10 is configured to acquire first audio/video information corresponding to a first target video and being currently played;

The determining module 20 is configured to determine a first scene type corresponding to the first audio/video information through a preset algorithm, and obtain a first scene application model corresponding to the first scene type;

the training module 30 is configured to input the first audio and video information into the first scene application model, and use an output of the first scene application model as a first text label corresponding to the first audio and video information, where the first text label is used to describe audio and video content corresponding to the first audio and video information;

and the sending module 40 is configured to send the first text label and a first timestamp corresponding to the audio/video information to a user terminal, where the user terminal displays the first text label and the first timestamp.

Further, the sending module 40 is further configured to:

and if the difference value between the first time stamp and the second time stamp is larger than a set threshold value, determining that the display interval is the segmentation interval.

The present invention also provides an information processing apparatus including:

the receiving module is used for receiving a third text label and a third timestamp corresponding to a second target video, wherein the third text label is generated by a server based on second audio/video information corresponding to the third text label through a corresponding second scene application model, the second scene application model is obtained by the server based on a scene type corresponding to the second audio/video information, and the second audio/video information is obtained by the server based on the currently played second target video;

and the display module is used for displaying the third text label and the third timestamp.

Further, the information processing apparatus is further configured to:

and if the first triggering times reach the preset times, displaying the third text label based on preset display parameters.

Further, the information processing apparatus is further configured to:

when an operation instruction corresponding to a text label currently displayed by the user terminal is detected, acquiring a third scene type of the text label corresponding to the operation instruction;

Further, the information processing apparatus is further configured to:

The method executed by each program unit may refer to each embodiment of the information processing method of the present invention, and will not be described herein.

In addition, an embodiment of the present invention also proposes an information processing apparatus including: memory, processor, and information processing program stored on the memory and executable on the processor, which when executed by the processor, implements steps for implementing an information processing method as described above

Furthermore, the embodiment of the present invention also proposes a computer-readable storage medium having stored thereon an information processing program which, when executed by a processor, implements the steps of the information processing method as described above.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. An information processing method, characterized by being applied to a server, comprising the steps of:

acquiring first audio and video information corresponding to a first target video and currently played, wherein the first audio and video information comprises barrage information corresponding to the first target video;

inputting the first audio and video information into the first scene application model, and taking the output of the first scene application model as a first text label corresponding to the first audio and video information, wherein the first text label is used for describing audio and video contents corresponding to the first audio and video information;

the first text label and a first timestamp corresponding to the first audio and video information are sent to a user terminal, wherein the user terminal displays the first text label and the first timestamp;

The step of sending the first text label and the first timestamp corresponding to the audio/video information to a user terminal, wherein the step of displaying the first text label and the first timestamp by the user terminal includes:

and sending the display interval, the first text label and the first timestamp corresponding to the audio and video information to a user terminal, wherein the user terminal displays the first text label and the first timestamp in a line mode based on the display interval.

2. The information processing method according to claim 1, wherein if there is a second text label corresponding to the target video before the current time, the step of determining a display pitch based on the first time stamp and a second time stamp corresponding to the second text label includes:

3. An information processing method, characterized by being applied to a user terminal, comprising the steps of:

receiving a third text label corresponding to a second target video, a third time stamp corresponding to the third text label and a display interval, wherein the third text label is generated by a server based on second audio-video information corresponding to the third text label through a corresponding second scene application model, the second scene application model is obtained by the server based on a scene type corresponding to the second audio-video information, the second audio-video information is obtained by the server based on a second target video which is currently played, and the display interval is determined by the server based on the third time stamp and a fourth time stamp corresponding to the fourth text label when a fourth text label corresponding to the second target video exists before the current moment, and the second audio-video information comprises bullet screen information corresponding to the second target video;

and displaying the third text labels and the third time stamps in a row based on the display interval.

4. The information processing method according to claim 3, wherein the step of displaying the third text label and the third time stamp includes:

5. The information processing method according to claim 3, characterized in that the information processing method further comprises:

6. The information processing method according to any one of claims 3 to 5, characterized in that the information processing method further comprises:

7. An information processing apparatus, characterized in that the information processing apparatus comprises:

the system comprises an acquisition module, a video processing module and a video processing module, wherein the acquisition module is used for acquiring first audio-video information corresponding to a first target video and currently played, and the first audio-video information comprises bullet screen information corresponding to the first target video;

the sending module is used for sending the first text label and the first timestamp corresponding to the audio and video information to a user terminal, wherein the user terminal displays the first text label and the first timestamp;

the sending module is further configured to determine, if a second text label corresponding to the target video exists before the current time, a display interval based on the first timestamp and a second timestamp corresponding to the second text label; and sending the display interval, the first text label and the first timestamp corresponding to the audio and video information to a user terminal, wherein the user terminal displays the first text label and the first timestamp in a line mode based on the display interval.

8. An information processing apparatus, characterized in that the information processing apparatus comprises: a memory, a processor, and an information processing program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the information processing method according to any one of claims 1 to 2 or 3-6.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon an information processing program which, when executed by a processor, realizes the steps of the information processing method according to any one of claims 1 to 2 or 3 to 6.