CN116304168A

CN116304168A - Audio playing method, device, equipment, storage medium and computer program product

Info

Publication number: CN116304168A
Application number: CN202111560546.9A
Authority: CN
Inventors: 刘凡绮; 朱露; 周广燕; 张博文; 赵长恩; 高磊
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-12-20
Filing date: 2021-12-20
Publication date: 2023-06-23

Abstract

The present application relates to an audio playing method, apparatus, computer device, storage medium and computer program product. The method relates to interface interaction of video playing, and comprises the following steps: displaying an audio playing interface of the pushed audio work, and playing a preferential listening clip in the pushed audio work; displaying segment marks of all segments in the pushed audio work on a playing progress bar of the audio playing interface, wherein all the segments in the pushed audio work comprise preferential listening segments, and all the segments have corresponding segment types; and responding to the triggering operation of the target segment mark on the playing progress bar, and playing the target segment matched with the target segment mark in the pushed audio work. By adopting the method, the efficiency of searching the audio works from the mass audio works can be improved.

Description

Audio playing method, device, equipment, storage medium and computer program product

Technical Field

The present invention relates to the field of computer technology, and in particular, to an audio playing method, an apparatus, a computer device, a storage medium, and a computer program product.

Background

The sustainable development of the original music market promotes the birth of the trading market of audio works, and a producer and a musician can trade the audio works, such as the Beat (a sound recording product obtained by mixing and synthesizing the sound, the human voice and other sounds performed by musical instruments, also called accompaniment) trade, and the musician can learn, reference or create new music works based on the Beat.

At present, a producer can upload produced audio works through platforms such as a webpage, a client and the like, and a musician with corresponding requirements can listen to the audio works from the platforms in an trial mode and seek the audio works which are matched with the own creation style or can excite the creation inspiration. However, the large number of audio works in the platform helps musicians to efficiently select favorite audio works from a large number of audio works to conduct transactions of the audio works, which is a problem not considered by the platform, resulting in low searching efficiency of the audio works.

Disclosure of Invention

In view of the foregoing, there is a need for an audio playback method, apparatus, computer device, computer-readable storage medium, and computer program product that can improve the efficiency of searching for audio works from among a vast number of audio works.

In a first aspect, the present application provides an audio playing method. The method comprises the following steps:

displaying an audio playing interface of the pushed audio work and playing a preferential listening clip in the pushed audio work;

displaying segment marks of all segments in the pushed audio work on a playing progress bar of the audio playing interface, wherein all segments in the pushed audio work comprise the preferential listening segment, and all segments have corresponding segment types;

and responding to the triggering operation of the target segment mark on the playing progress bar, and playing the target segment matched with the target segment mark in the pushed audio work.

In a second aspect, the present application further provides an audio playing device. The device comprises:

the playing module is used for displaying an audio playing interface of the pushed audio work and playing a priority listening clip in the pushed audio work;

the display module is used for displaying the segment marks of all the segments in the pushed audio work on the playing progress bar of the audio playing interface, wherein all the segments in the pushed audio work comprise the preferential listening segment, and all the segments have corresponding segment types;

And the segment switching module is used for responding to the triggering operation of the target segment mark on the playing progress bar and playing the target segment matched with the target segment mark in the pushed audio work.

In one embodiment, the audio playing device further comprises:

and the audio switching module is used for responding to the triggering operation of switching the audio in the audio playing interface and playing the preferential listening clip in the next pushed audio work.

In one embodiment, the audio switching module is further configured to display, in the audio playing interface, a card area corresponding to the currently played audio work; and responding to the triggering operation of switching the audio in the card area, stopping playing the preferential listening clip in the currently played audio work, and playing the preferential listening clip in the next pushed audio work.

In one embodiment, the audio playing device further includes a card changing module, configured to, in response to a triggering operation of switching audio in the card area, remove, from the audio playing interface, a card area corresponding to the currently played audio work; and displaying a card area corresponding to the next pushed audio work in the process of moving out the card where the played audio work is located from the audio playing interface.

In one embodiment, the card changing module is further configured to obtain a current card binary array, where a first element and a second element in the card binary array respectively represent, in an alternating manner, audio information of the currently played audio work and audio information of a next pushed audio work, where the audio information is used to render a card area corresponding to the audio work; when the audio information of the currently played audio work is derived from the first element in the card binary array, responding to the triggering operation of switching audio in the card area, rendering and updating the card area corresponding to the currently played audio work according to the second element in the card binary array, switching the audio work according to the second element in the card binary array for playing, and updating the first element in the card binary array according to the audio information of the next pushed audio work after the audio information of the next pushed audio work is obtained; when the audio information of the currently played audio work is derived from the second element in the card binary array, responding to the triggering operation of switching the audio in the card area, rendering and updating the card area corresponding to the currently played audio work according to the first element in the card binary array, switching the audio work according to the first element in the card binary array for playing, and updating the second element in the card binary array according to the audio information of the next pushed audio work after the audio information of the next pushed audio work is obtained.

In one embodiment, the audio playing device further comprises:

the audio work collection module is used for responding to the triggering operation of collecting audio in the audio playing interface and adding the currently played audio work to an audio collection list of a target account; and responding to the triggering operation of the audio collection icon in the audio playing interface, and displaying the audio collection interface where the audio collection list is located.

In one embodiment, the audio playing device further comprises:

the audio detail viewing module is used for responding to the triggering operation of viewing the audio details in the audio playing interface and displaying the audio detail interface of the currently played audio work;

the display module is further used for displaying a playing progress bar of the currently played audio work in the audio detail interface; and displaying the segment marks of all the segments in the currently played audio work on the playing progress bar of the audio detail interface.

In one embodiment, the playing module is further configured to stop playing the preferential listening clip in response to a triggering operation on a target segment mark on a playing progress bar of the audio detail interface, and play a target clip matching the target segment mark in the currently played audio work.

In one embodiment, the audio detail interface includes a trading value of the currently playing audio work; the audio playing device further includes:

the audio transaction module is used for responding to the triggering operation of the audio transaction in the audio detail interface and displaying the transaction interface of the audio work which is currently played; and responding to the triggering operation of submitting orders in the trading interface, and generating the trading orders of the currently played audio works based on the trading values.

In one embodiment, the playing module is further configured to:

and in response to the input of the target segment type from the audio playing interface, displaying the audio information of the pushed audio work on the audio playing interface, wherein the preferential listening segment in the pushed audio work is the segment of the target segment type.

In one embodiment, the audio playing device further comprises: the pushing module is used for acquiring historical priority listening data corresponding to the target account; determining a preference fragment type corresponding to the target account according to the historical priority listening data; pushing the preferential listening clip as the audio work of the preference clip type.

In one embodiment, the audio playing device further comprises:

The audio work searching module is used for displaying an audio searching interface of the audio work; and responding to the input of the search word in the search box of the audio search interface, and displaying the audio works matched with the search word in the search result area of the audio search interface.

In one embodiment, the audio work searching module is further configured to, in response to selecting a target segment type from a plurality of preset segment types displayed on the audio searching interface, display, in a search result area of the audio searching interface, an audio work that matches the search term and includes a segment of the target segment type.

In one embodiment, the audio work searching module is further configured to, in response to selecting a target tag from a plurality of preset filtering tags displayed on the audio searching interface, display an audio work matching the target tag in a search result area of the audio searching interface.

In one embodiment, the audio playing device further comprises:

the audio detail viewing module is used for responding to the triggering operation of viewing the audio details of the target audio works in the search result area and displaying an audio detail interface of the target audio works; the audio detail interface comprises a transaction numerical value of the target audio work;

The audio transaction module is used for responding to the triggering operation of the audio transaction in the audio detail interface and displaying the transaction interface of the target audio work; and responding to the triggering operation of submitting orders in the trading interface, and generating the trading orders of the target audio works based on the trading values.

In one embodiment, the playing module is further configured to display an audio playing interface of the pushed audio work in a child application, where the child application is an application program running in a parent application, and the parent application is associated with the target identifier;

the audio playing device further includes:

and the resource transfer module is used for carrying out resource transfer operation on the transaction account number associated with the target identifier according to the transaction numerical value of the audio work included in the transaction order.

In one embodiment, the audio playing device further comprises:

the preferential listening segment generation module is used for dividing the audio work into a plurality of segments with equal length; respectively carrying out classification prediction on each segment of the audio work through a trained preferential auditory segment prediction model to obtain a prediction result of whether each segment belongs to the preferential auditory segment; and merging a plurality of continuous preferential listening clips in the audio work according to the prediction result to obtain the preferential listening clips in the audio work.

In one embodiment, the preferential auditory phrase generating module is further configured to convert the phrase into a corresponding spectrogram through the trained preferential auditory phrase prediction model, extract image features of the spectrogram, and output a prediction result of whether the phrase belongs to the preferential auditory phrase based on the image features.

In one embodiment, the audio playing device further comprises a training module for dividing the sample audio into a plurality of equal-length sample segments and determining labeling information about whether each sample segment belongs to a preferred-to-listen segment; respectively inputting a sample fragment into a preferential auditory segment prediction model, converting the sample fragment into a corresponding spectrogram through the preferential auditory segment prediction model, extracting image characteristics of the spectrogram, and outputting the prediction probability of the sample fragment belonging to the preferential auditory segment based on the image characteristics; and calculating cross entropy loss based on the labeling information of the sample fragment and the prediction probability that the sample fragment belongs to the preferential auditory segment, and updating model parameters of the preferential auditory segment prediction model according to the cross entropy loss to obtain a trained preferential auditory segment prediction model.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory and a processor, wherein the memory stores a computer program, and the processor realizes the steps of the audio playing method when executing the computer program.

In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the above-described audio playback method.

In a fifth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of the above-described audio playback method.

In the audio playing method, the device, the computer equipment, the storage medium and the computer program product, each segment forming the audio work comprises the priority listening segment, and after the audio playing interface of the pushed audio work is displayed, the priority listening segment in the pushed audio work is directly played, wherein the priority listening segment is the segment which is intended to be heard preferentially by the user in the segment forming the audio work, so that the matching efficiency between the audio work and the user opening the audio playing interface can be improved, and the user can find the interested audio work as soon as possible. In addition, each segment forming the audio work has a corresponding segment type, and the segment mark of each segment in the audio work is displayed on the playing progress bar of the audio playing interface, so that a user can trigger an operation through the segment mark on the playing progress bar to play the segments of different segment types in the audio work. Therefore, for each pushed audio work, only part of clips need to be played to finish appreciation and listening of the audio work, all clips do not need to be played, and under the condition that the number of the audio works is large, the matching efficiency between the audio works and users who open the audio playing interface can be greatly improved, and the efficiency of searching the audio works from mass audio works is improved.

Drawings

FIG. 1 is a diagram of an application environment of an audio playback method according to an embodiment;

FIG. 2 is a flow chart of an audio playing method according to an embodiment;

FIG. 3 is an interface diagram of an audio playback interface according to one embodiment;

FIG. 4 is a schematic diagram of an audio playback interface according to another embodiment;

FIG. 5 is a schematic diagram illustrating interface changes of an audio playback interface according to an embodiment;

FIG. 6 is a schematic diagram showing the relative positions of a first card and a second card according to one embodiment;

FIG. 7 is a schematic diagram of two elements of a binary array of cards corresponding to a cross of the two elements in one embodiment;

FIG. 8 is a schematic diagram illustrating a variation of an audio playback interface according to another embodiment;

FIG. 9 is a schematic diagram of an interface change from an audio playback interface to an audio detail interface in one embodiment;

FIG. 10 is an interface diagram of an audio search interface in one embodiment;

FIG. 11 is a schematic diagram of a sifting through labels in one embodiment;

FIG. 12 is a schematic diagram of an entry into an audio details interface from an audio search interface in one embodiment;

FIG. 13 is a flow diagram of predicting a preferred auditory phrase in an audio work in one embodiment;

FIG. 14 is a diagram of a network architecture of a preferred auditory phrase prediction model in one embodiment;

FIG. 15 is a schematic diagram of a data storage system for an audio work in one embodiment;

FIG. 16 is a block diagram of an audio playback apparatus according to an embodiment;

fig. 17 is an internal structural view of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

Some terms or concepts referred to in this application are described below:

an audio work: three general categories can be distinguished. First kind, sound sampling, sound effect materials, field recording and other original recording fragments. Second, creation of songs, etc. Third category, lectures, audio books, drama, etc. In an embodiment of the present application, the audio work may be an original accompaniment (bean) recording in the first category. The audio works can be used for music people to study, reference or create a finished song for the second time after the transaction.

Beat: sound recordings made by mixing sounds performed using musical instruments (one or more musical instruments including, but not limited to, drums, synthesizers, etc.), human voices, and other sounds, typically include a plurality of audio tracks, primarily for accompaniment of human voices, such as, for example, musical accompaniment used by a talking singer.

Trading of audio works: after the musician purchases the audio works, the musician can use a plurality of audio tracks for research, reference or secondary creation of the purchasers to obtain finished songs, such as song release and commercial use after 'collaborative' music creation. The producers were the main bean authoring community.

And (5) segment marking: the audio work is segmented by different segment types of the included segments, each segment being marked. For example, the composition of a bean typically includes the following parts: intro, chord, pre-chord, hook, bridge, outro (end). Where Intro is the beginning of a bean, typically four bars, the Hook typically appears in melody form, typically eight bars, which are repeated multiple times throughout the song. According to the method and the device, the audio works are marked in a segmented mode according to the segment types of the segments in the audio works, so that users can be effectively helped to screen the audio works meeting the intention of the users, the screening efficiency of the audio works is improved, and the trading efficiency of the audio works can be further improved in scenes of trading of the audio works.

The audio playing method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server.

In one embodiment, the terminal 102 may display an audio playing interface of the pushed audio work, play a preferential listening clip in the pushed audio work, display a segment mark of each clip in the pushed audio work on a playing progress bar of the audio playing interface, where each clip in the pushed audio work includes the preferential listening clip, each clip has a corresponding clip type, and play a target clip in the pushed audio work that matches the target segmentation mark in response to a triggering operation on the target segmentation mark on the playing progress bar.

In one embodiment, the terminal 102 has installed and running thereon a parent application, which is a native application running directly on the operating system, such as a social application, mail application, payment application, or gaming application, among others. Social applications, including instant messaging applications, SNS (Social Network Service, social networking site) applications, or live applications, among others. The instant messaging application is an application program supporting instant messaging, and can support instant messaging between users with friend relations and also support instant messaging between strange users. A child application is a particular application developed to achieve a target function, which is an application program that can run in the environment provided by the parent application, and in some examples, the child application may be an applet.

In one embodiment, the sub-application is an application that supports the trading of audio works. Optionally, the terminal may display an audio playing interface of the pushed audio work through the sub-application, play a preferential listening segment in the pushed audio work, display a segment mark of each segment in the pushed audio work on a playing progress bar of the audio playing interface, where each segment in the pushed audio work includes the preferential listening segment, each segment has a corresponding segment type, and play a target segment in the pushed audio work that matches the target segment mark in response to a triggering operation on the target segment mark on the playing progress bar.

In one embodiment, the terminal may further generate a trade order of the audio work currently played through the sub-application, and then call a resource transfer interface provided by the parent application server corresponding to the parent application, and perform a resource transfer operation on a trade account number associated with the target identifier used by the logged parent application according to a trade value of the audio work included in the trade order.

The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.

In one embodiment, as shown in fig. 2, an audio playing method is provided, and the method is applied to the terminal 102 in fig. 1 for illustration, and includes the following steps:

step 202, displaying an audio playing interface of the pushed audio work, and playing a preferential listening clip in the pushed audio work.

Wherein the audio work comprises at least one segment that together comprise the audio work. Preferably, the audio work comprises a plurality of segments.

Each of the clips constituting the audio work has a corresponding clip type, and each of the clips may have the same clip type or different clip types. Preferably, each fragment has a different fragment type.

The preferred auditory phrase is one or more of the individual phrases comprising the pushed audio work. Preferably, the preferred auditory phrase is one of the individual phrases comprising the pushed audio work. A preferred auditory phrase of an audio work may be a phrase that is played preferentially when the audio work is played. Illustratively, it will be understood that the preferred listening segment is the segment with the highest playing priority among the segments of the audio work, and of course, the playing priority is only drawn here for illustrative purposes, and in one embodiment, the segment with the highest playing priority may be used as the preferred listening segment by dividing the playing priority of each segment; in another embodiment, the preferential listening segment may be determined without dividing the playing priority, for example, a specified mark in the audio work may be identified, and a segment corresponding to the specified mark may be used as the preferential listening segment, where the specified mark may be a manual mark, for example, a mark such as an creator or a producer of the audio work, or an automatic mark, for example, in response to a target segment type currently selected by a user, a mark for indicating a target segment type in the audio work may be identified, and a segment corresponding to the target segment type may be used as the preferential listening segment of the current audio work.

In some embodiments, the preferential listening clips of the same audio work may be different at different stages, i.e., the playback priority of the individual clips of the audio work may be variable, non-fixed, and not limited herein.

In one embodiment, the preferred auditory phrase may be a phrase that the producer of the audio work or the owner of the audio work intended to sell intends to be preferred by the user (e.g., a musician), wherein the preferred auditory phrase will be played preferentially over other phrases therein after the audio work is pushed to the playback terminal. The preferred auditory phrase may be actively marked by the producer of the audio work or by the owner intending to sell the audio work by the start-stop time point of the preferred auditory phrase when the audio work is uploaded. For example, a single bet includes the following fragments, of the fragment type: intro, chorus, hook, outro, a fragment of the several fragments, fragment type Hook, may be marked as a preferred listening fragment of the bean. When the preferential listening clip is not actively marked or for realizing automatic marking, the preferential listening clip can also be a clip which is automatically identified to the audio work through a preferential listening clip prediction model.

The audio playing interface is a user interaction interface for playing an automatically pushed audio work. The terminal can immediately play the preferential listening clip in the pushed audio work after entering and displaying the audio playing interface of the pushed audio work. At this time, the preferential listening clip may be a clip that is preferentially played when the audio work is played in the audio playing interface. The pushed audio work can be a first audio work pushed after the terminal enters the audio playing interface, and can also be a second audio work and a third audio work pushed in response to the triggering operation of switching the audio. It can be understood that, because the audio playing interface plays the automatically pushed audio works, the audio playing interface can be used for searching and exploring users with ambiguous creation demands and searching inspiration in massive audio works.

In one embodiment, the clip type of the preferred auditory phrase in the pushed audio work may be the clip type of the current user preference. And determining the type of the preference fragment corresponding to the target account according to the historical preferential listening data by acquiring the historical preferential listening data corresponding to the target account, and searching the audio works with the preferential listening fragment as the type of the preference fragment from the audio work library to serve as the pushed audio works. The target account number is used for identifying the current user, and the preferential listening clip is the preference type of the current user by pushing the preferential listening clip, so that the preferential listening clip directly played after each audio work pushing is the preference type of the current user, the matching degree between the user and the audio work can be improved, and the trading efficiency of the audio work is improved.

The historical preferential listening data corresponding to the target account number can be determined according to the triggering operation of the current user on the segment mark on the playing progress bar, and the preference segment type of the current user is determined by determining the segment type of the segment adjusted by the triggering operation of the current user on the segment mark on the playing progress bar. For example, after each time the current user adjusts to a Hook segment of the pushed audio work, the duration of stay listening trial is longer than other segments, or the frequency of adjusting to the Hook segment in the pushed audio work is more than other segments, then the current user may be considered to be willing to find a Hook segment of the audio work, and the preferred segment type of the current user is Hook. The historical preferential listening data corresponding to the target account number can also be determined according to the type of the section selected by the current user for the preferential listening section on the audio search interface. For example, in the history record of the segment types selected by the current user for the preferential listening segment in the plurality of preset segment types displayed on the audio search interface, the target segment type accounts for more than other segment types, and then the target segment type can be determined to be the preferential segment type of the current user.

In some embodiments, the preference fragment type of the current user may also be predicted by a machine-learning based model based on the representation data of the current user, behavior data in the audio trading platform, play records, and trade record data for the audio work, among others.

In one embodiment, the audio playing method may further include: in response to inputting the target clip type from the audio playback interface, audio information of the pushed audio work is presented at the audio playback interface, and the preferred clips in the pushed audio work are clips of the target clip type.

In this embodiment, a clip type selection control or a search box is set in the audio playing interface, so that the user can select the clip type of the preferred listening clip that wants to listen to. Therefore, even if the audio works are automatically pushed, the preferential listening clips are the target clip types selected by the current user, so that the initial listening of the current user at random can be met, the preference of the current user can be met, the recommendation effective rate of the audio works is improved, and the matching degree between the current user and the pushed audio works is further improved.

After entering the audio playing interface, the preferential listening clip in the pushed audio work can be directly played without redundant operation of a user, wherein the preferential listening clip is a clip which is intended to be heard by the user preferentially by a producer or all persons of the audio clip, and can be the most wonderful and attractive clip in the audio clip. Therefore, for each pushed audio work, only part of clips need to be played to finish appreciation and listening of the audio work, all clips do not need to be played, and under the condition that the number of the audio works is large, the matching efficiency between the audio works and users who open the audio playing interface can be greatly improved, and the efficiency of searching the audio works from mass audio works is improved.

In one embodiment, after the terminal displays the pushed audio playing interface, the server pushes the audio work and the work information of the audio work to the terminal on line, and the terminal can display the work information of the pushed audio work through the audio playing interface and simultaneously play the preferential listening clip in the pushed audio work.

Optionally, after the terminal displays the audio playing interface of the pushed audio work, the terminal may generate an audio request based on the user identifier of the current login user, the terminal sends the audio request to the server, the server screens the audio work to be pushed matching with the user identifier from the audio work library based on the user identifier carried in the audio request, and the server may respond to the audio request and return the audio work and corresponding audio information to the terminal. The audio information includes work information of the audio work, such as a duration of the audio work, a trading value, a work name, an author, a cover map, and the like, and may further include a clip type of each clip in the audio work and a start-stop time point corresponding to each clip in the entire audio work.

FIG. 3 is a schematic diagram of an audio playback interface according to one embodiment. Referring to fig. 3, the audio playing interface 300 displays the work information of the currently pushed audio work, including a cover map 302, a work name 304, and an author 306 of the audio work.

Step 204, displaying segment marks of each segment in the pushed audio work on the playing progress bar of the audio playing interface, wherein each segment in the pushed audio work comprises a priority listening segment, and each segment has a corresponding segment type.

As previously described, the audio work includes a plurality of clips, each having a clip type, the preferred clip being one of the clips that make up the pushed audio work. The segment marks of each segment in the audio work displayed on the playing progress bar are used for representing the start and stop time points corresponding to each segment and can also be used for representing the relative positions of each segment in the whole audio work.

In one application scenario, a producer or a proprietor of an audio work may perform segment marking on the audio work when uploading the audio work, that is, divide the audio work into a plurality of segments and mark a start and stop time point of each segment, and may submit a segment type of each segment, such as Hook (chorus), bridge (Bridge), and the like. As mentioned above, the composition of a bean generally comprises the following parts: intro, chord, pre-chord, hook, bridge, outro (end). It will be appreciated that the fragment types of a fragment in a single coat may contain only one or two of the fragments described above, and not necessarily all of the fragment types. For example, a fragment of a bean includes Intro, hook, bridge, outro. The producer or owner may also designate one of the segments as a preferred-to-listen segment, in which case the server may mark the start-stop time point of each segment submitted by the producer or owner of the stored audio work in segments.

In another scenario, when a producer or all of the audio work does not mark the audio work in sections when uploading the audio work, the server may divide the audio work into a plurality of sections using the audio segmentation model and obtain a start-stop time point of each section, and the server may predict a priority listening section in the audio work from the section using the priority section prediction model, so as to obtain a start-stop time point corresponding to the priority listening section.

When the server pushes the audio work to the terminal, the start-stop time point of each clip and the start-stop time point corresponding to the preferential listening clip may be sent to the terminal as part of the audio information. The terminal can display the segment marks of the segments in the pushed audio work on the playing progress bar of the audio playing interface.

In one embodiment, the terminal may determine a ratio of a start-stop time point corresponding to each clip to a total duration of the audio work, determine a position point corresponding to each start-stop time point on the playing progress bar according to the total length of the playing progress bar and the ratio, and add a segment mark at the segment position point.

And step 206, in response to the triggering operation of the target segment mark on the playing progress bar, playing the target segment matched with the target segment mark in the pushed audio work.

Wherein each segment flag on the play progress bar may be triggered. The triggering operation of the target segment mark may be a single click operation, a double click operation, a pressing operation, a sliding operation, or the like of the segment mark, which is not limited in the embodiment of the present application. When a certain segment mark is triggered by a user, the terminal can play a target segment taking the time point corresponding to the segment mark as a starting time point in the audio work.

In one embodiment, after the terminal plays the preferential listening clip in the pushed audio work, if the triggering operation of the segmentation mark on the playing progress bar is not detected at the end of playing the preferential listening clip, and the triggering operation of the switching audio is not detected, the terminal can continue to play the next clip and the next clip in the audio work after the preferential listening clip is played until the pushed audio work is played. Optionally, after the playing of the pushed audio work is finished, if the triggering operation of the segmentation mark on the playing progress bar is not detected, and the triggering operation of the switching audio is not detected, the terminal may stop playing the audio work, and may start playing the pushed audio work from the beginning.

Referring to fig. 3, in the audio playback interface 300, segment labels 308 for each segment in the audio work currently being played are displayed, as are nodes 310 to which the current playback is to be performed. The user can directly play the clip with the time point corresponding to the segment mark 308 as the starting time point by clicking the segment mark 308.

According to the audio playing method, the preferential listening clips are included in the clips forming the audio work, after the audio playing interface of the pushed audio work is displayed, the preferential listening clips in the pushed audio work are directly played, and the preferential listening clips are the clips which are intended to be heard preferentially by the user in the clips forming the audio work, so that the matching efficiency between the audio work and the user opening the audio playing interface can be improved, and the user can find the interested audio work as soon as possible. In addition, each segment forming the audio work has a corresponding segment type, and the segment mark of each segment in the audio work is displayed on the playing progress bar of the audio playing interface, so that a user can trigger an operation through the segment mark on the playing progress bar to play the segments of different segment types in the audio work. Therefore, for each pushed audio work, only part of clips need to be played to finish appreciation and listening of the audio work, all clips do not need to be played, and under the condition that the number of the audio works is large, the matching efficiency between the audio works and users who open the audio playing interface can be greatly improved, and the efficiency of searching the audio works from mass audio works is improved.

The search-type user is a user type which aims to search inspiration through an audio playing interface by using the fragmentation time and has no explicit target, and the user of the type hopes to bring inspiration for the creation of the user when hearing a certain audio work. Specifically, the triggering operations of the seeking exploratory user on the audio work currently played in the audio playing interface can be divided into four categories. Optionally, the triggering operation in the audio playing interface may be a triggering operation for a card area corresponding to the currently played audio work.

The first type of terminal sets up the play progress bar in the card area that the audio work that plays at present corresponds, as described earlier, when the user wants to listen to continuously, the terminal can respond to the triggering operation to the target segmentation mark on the play progress bar, stop playing the preferential listening segment, play the target segment that matches with the target segmentation mark in the pushed audio work, or reduce the play volume of the preferential listening segment that plays at present, play the target segment that matches with the target segmentation mark.

FIG. 4 is a schematic diagram of an audio playback interface according to one embodiment. Referring to fig. 4, in the audio playing interface 400, a card area 402 corresponding to the audio work currently being played is displayed, in which a playing progress bar, a cover chart of the audio work currently being played, a work name, an author name, and a segment mark corresponding to each segment are displayed on the playing progress bar.

Second, the user does not like the audio work currently played, and the terminal can play the preferential listening clip in the next pushed audio work by switching the triggering operation of the audio.

In one embodiment, the audio playing method further comprises: and responding to the triggering operation of switching the audio in the audio playing interface, and playing the preferential listening clip in the next pushed audio work.

The foregoing describes that the audio playing interface is a user interaction interface for playing an automatically pushed audio work, and when the pushed audio work fails to excite the user's interest, the user can switch to play the next pushed audio work by switching the triggering operation of the audio. The triggering operation of switching the audio may be a sliding operation in the audio playing interface, such as a right sliding operation, a left sliding operation, an up sliding operation, a down sliding operation, etc., which is not limited in the embodiment of the present application.

For example, after the user listens to the preferential listening clip in the first pushed audio work, the preferential listening clip does not excite the user's creation inspiration or the user does not have the intention of doing the style of secondary creation, so the user can directly switch the audio through the right sliding operation, and the terminal plays the preferential listening clip in the next pushed audio work according to the next pushed audio work fed back by the server. In addition, the terminal re-renders and displays the audio playing interface according to the audio information related to the next pushed audio work.

In one embodiment, in response to a triggering operation of switching audio in an audio playing interface, playing a preferred listening clip in a next pushed audio work, including: displaying a card area corresponding to the currently played audio work in the audio playing interface; and responding to the triggering operation of switching the audio in the card area, stopping playing the preferential listening section in the currently played audio work, and playing the preferential listening section in the next pushed audio work.

In this embodiment, by displaying the card area corresponding to the currently played audio work in the audio playing interface, when the audio work is switched every time, the audio switching from the visual and auditory aspects can be realized only by re-rendering and displaying the card area according to the audio information related to the next pushed audio work, the whole audio playing interface is not required to be re-rendered, the rendered data is reduced, and the efficiency of audio switching is improved. In addition, the user can realize the switching, collection, jumping to interested fragments and the like of the audio works only by carrying out corresponding triggering operation on the card area corresponding to the currently played audio works, the operation is very convenient, the seeking and exploring type user can find inspiration by utilizing the fragmentation time better, and the man-machine interaction experience between the user and the terminal is improved.

In one embodiment, the audio playing method further comprises: and responding to the triggering operation of switching the audio in the card area, moving out the card area corresponding to the currently played audio work from the audio playing interface, and displaying the card area corresponding to the next pushed audio work in the process of moving out the card where the played audio work is located from the audio playing interface.

Optionally, in the process of moving out the card where the played audio work is located from the audio playing interface, the card area corresponding to the next pushed audio work may be gradually displayed in the audio playing interface in a gradual display manner until all the card area corresponding to the currently played audio work is moved out, and all the next pushed audio work is displayed. In the process of moving out the card where the played audio work is located from the audio playing interface, the card area corresponding to the next pushed audio work can be displayed in the audio playing interface in a direct display mode, namely, when all the card areas corresponding to the current played audio work are moved out, the card area corresponding to the next pushed audio work can be displayed in the audio playing interface directly. Of course, the terminal can also directly cancel the display of the card area corresponding to the currently played audio work instead of gradually moving out, and directly display the card area corresponding to the next pushed audio work after directly canceling the display of the card area corresponding to the currently played audio work.

FIG. 5 is a schematic diagram of interface changes of an audio playback interface according to an embodiment. Referring to fig. 5, in part (a) of fig. 5, audio information related to a currently played audio work is displayed in an audio playing interface, in part (b) of fig. 5, when a user triggers a right-slide operation, a card area corresponding to the currently played audio work is gradually moved out, a next pushed audio work is gradually displayed until all card areas corresponding to the currently played audio work are moved out, and when all next pushed audio works are displayed, referring to part (c) of fig. 5, the next pushed audio work becomes the latest currently played audio work, and the terminal re-renders and displays the audio information related to the next pushed audio work in the card area.

In order to ensure that the user can instantly display the related information of the next pushed audio work and play the next pushed audio when switching the audio through sliding operation each time, the time delay brought by rendering display and starting playing is reduced, and when the terminal detects the triggering operation of the user for switching the audio each time, the terminal can display and play the next pushed audio work and pull a new audio work again.

In one embodiment, the audio playing method further comprises: acquiring a current card binary array, wherein a first element and a second element in the card binary array respectively represent the audio information of the current played audio work and the audio information of the next pushed audio work alternately, and the audio information is used for rendering a card area corresponding to the audio work; when the audio information of the currently played audio work is derived from the first element in the card binary array, responding to the triggering operation of switching the audio in the card area, rendering and updating the card area corresponding to the currently played audio work according to the second element in the card binary array, switching the audio work according to the second element in the card binary array for playing, and updating the first element in the card binary array according to the audio information of the next pushed audio work after the audio information of the next pushed audio work is obtained; when the audio information of the currently played audio work is derived from the second element in the card binary array, responding to the triggering operation of switching the audio in the card area, rendering and updating the card area corresponding to the currently played audio work according to the first element in the card binary array, switching the audio work according to the first element in the card binary array for playing, and updating the second element in the card binary array according to the audio information of the next pushed audio work after the audio information of the next pushed audio work is obtained.

Specifically, in order to avoid accurately displaying a card area corresponding to the currently played audio work and a card area corresponding to the next pushed audio work in the audio playing interface, the terminal may maintain a card binary Array, where a first element Array [0] and a second element Array [1] in the card binary Array represent the audio information of the currently played audio work and the audio information of the next pushed audio work alternately, respectively, and the audio information stored in the elements is used for rendering the card area corresponding to the audio work. For convenience of description, the card area corresponding to the currently played audio work is denoted as a first card, and the card area corresponding to the next pushed audio work is denoted as a second card.

FIG. 6 is a schematic diagram showing the relative positions of the first card and the second card in one embodiment. Referring to fig. 6, it can be seen that the first card corresponds to the currently played audio work, that is, the card that needs to be slid when the user is about to switch audio, and the second card corresponds to the next pushed audio work, that is, the new card presented after the user slides the first card.

The two elements in the binary Array of the card are the audio information representing the currently played audio work and the audio information of the next pushed audio work alternately, instead of always keeping the corresponding Array [0] of the first card and the corresponding Array [1] of the second card, and ejecting and pressing the Array of the second card according to the newly pulled audio work data, because in the way, in the process of sliding the first card by the user, the second card (always corresponding to Array [1 ]) which is in the visual range of the first card is subjected to primary data alternation, and for the user, after the first card slides, the content of the second card is mutated, so that the phenomenon of card flickering occurs.

For example, assume that the first element Array [0] of the Array corresponding to the card binary Array is held, the second element Array [1] of the Array corresponding to the card binary Array is held, the value of the Array corresponding to the card binary Array is [ a, B ] in the initial state, the newly pulled audio work data is C, after the card first slides away, the ejecting and pressing operation is performed on the Array corresponding to the card binary Array, the changing state process of the Array is [ a, B ] - > [ a, B, C ] - > [ B, C ], the changing process of the card is [ card first, card second ] - > [ blank, card second ] - > [ card first, card second ], and the content of the card second is corresponding to C in the second state of the card, which is represented as a content mutation.

By adopting the alternative characterization mode, the ejecting and pressing operations do not need to be executed, and only the alternative characterization and alternative updating are needed. When the audio information of the currently played audio work is derived from the first element in the card binary array, namely when the card one corresponds to the first element Arry [0] in the card binary array, if the terminal receives the triggering operation of switching the audio, the card one corresponds to the second element Arry [1] in the card binary array, the second element Arry [1] in the card binary array is used for rendering and updating the card area corresponding to the currently played audio work, in addition, the audio information of the next pushed audio work is pulled, and the audio information of the next pushed audio work is stored in the first element Arry [0] in the card binary array. When the audio information of the currently played audio work is derived from the second element in the card binary array, namely, when the card one corresponds to the second element Arry [1] in the card binary array, if the terminal receives the triggering operation of switching the audio, the card one corresponds to the first element Arry [0] in the card binary array, the card area corresponding to the currently played audio work is rendered and updated by using the first element Arry [0] in the card binary array, in addition, the audio information of the next pushed audio work is pulled, and the audio information of the next pushed audio work is stored in the second element Arry [1] in the card binary array.

FIG. 7 is a schematic diagram of two elements of a binary array of cards corresponding to each other in a cross manner. Referring to fig. 7, in initialization, card one corresponds to the first element Array [0] in the card binary Array, card two corresponds to the second element Array [1] in the card binary Array, after the user slides card one, the original card two becomes a new card one, and the new card one corresponds to the second element Array [1] in the card binary Array, and after the first element Array [0] in the card binary Array is updated, the new card two corresponds to the first element Array [0]. The user slides again, then let card one correspond to this card binary Array in first element Array [0], after card binary Array in second element Array [1] is updated, card two correspond to this card binary Array in second element Array [1], so push up, guarantee that the card two that the user is located the card lower floor after each time slides will not change, the user can not see card two and flash the phenomenon.

Third, users like the audio works currently played, and the audio works currently played can be collected online through triggering operation of collecting audio.

In one embodiment, the terminal adds the currently played audio work to the audio collection list of the target account in response to a triggering operation of collecting audio in the audio playing interface, and displays the audio collection interface where the audio collection list is located in response to the triggering operation of the audio collection icon in the audio playing interface.

For example, the user can realize switching playing of the next pushed audio work through a right-sliding operation in the audio playing interface, realize online collection of the currently played audio work through a down-sliding operation in the audio playing interface, and display an audio detail interface of the currently played audio work through a clicking operation in the audio playing interface.

Fig. 8 is a schematic diagram illustrating a change of an audio playing interface in one embodiment. Referring to fig. 8 (a), in the audio playing interface 800, below the card area 802 corresponding to the currently played audio work, an audio collection icon 804 is displayed, when the user slides down the card area, referring to fig. 8 (b), after the "added to audio collection list" prompt message is popped up, the audio playing interface 800 shown in fig. 8 (c) is displayed again, and at this time, numbers may be marked on the audio collection icon 804 to indicate the number of audio works collected in the audio collection list. The user may also enter an audio collection interface 820, shown in part (d) of FIG. 8, in which an audio collection list is displayed, by clicking on the audio collection icon 804. In the audio collection interface 820, the user may also remove a particular audio work from the audio collection list by a delete operation. The user can enter the audio detail interface of a certain audio work through clicking operation on the audio work in the audio collection list, so that the detail of the audio work or the transaction of the audio work can be checked.

In this embodiment, when a user likes an audio work currently played, online collection of the audio work can be achieved only by a simple trigger operation.

Fourth, the user is interested in the audio work currently played, and can know the detail information of the audio work by checking the triggering operation of the audio details.

In one embodiment, in response to a triggering operation of viewing audio details in an audio playing interface, an audio detail interface of a currently played audio work is displayed, a playing progress bar of the currently played audio work is displayed in the audio detail interface, and segment marks of all segments in the currently played audio work are displayed on the playing progress bar of the audio detail interface.

That is, the terminal marks each of the audio clips on the playing progress bar, and the clips appear not only in the audio playing interface but also in the audio detail interface of the audio work. Of course, as long as it is an interface including a playing progress bar of the audio work, the terminal may display a segment mark on the playing progress bar, and the segment mark may be triggered to jump to each different segment for playing.

The triggering operation for viewing the audio details in the audio playing interface can be clicking operation on a card area corresponding to the audio work currently played in the audio playing interface. The terminal may display an icon in the audio playback interface for accessing the audio detail interface, such as "purchase", which the user may access by clicking on.

In one embodiment, the audio playing method further comprises: and responding to the triggering operation of the target segment mark on the playing progress bar of the audio detail interface, stopping playing the preferential listening segment, and playing the target segment matched with the target segment mark in the currently played audio work.

Specifically, when a certain segment mark on the playing progress bar of the audio detail interface is triggered by a user, the terminal can play a target segment taking a time point corresponding to the segment mark as a starting time point in the audio work.

In one embodiment, the audio detail interface includes a transaction value of the currently played audio work, and the audio playing method further includes: and responding to the triggering operation of the audio transaction in the audio detail interface, displaying the transaction interface of the audio work which is currently played, responding to the triggering operation of submitting the order in the transaction interface, and generating the transaction order of the audio work which is currently played based on the transaction numerical value.

The user is very interested in the currently played audio work, and after entering the audio detail interface from the audio playing interface, the user can check the audio detail information through the audio detail interface. The audio detail interface not only comprises the playing progress bar of the currently played audio work, but also comprises each version of the audio work and corresponding transaction values, and a user can select the corresponding version to conduct online transaction of the audio work according to the needs to generate a transaction order of the currently played audio work.

As shown in fig. 9, an interface change diagram from an audio playing interface to an audio detail interface is shown in one embodiment. Referring to fig. 9 (a), for an audio work currently played in the audio playing interface 900, the user may enter the audio detail interface 920 of the audio work by clicking "buy", and as shown in fig. 9 (b), the audio detail interface 920 may include a playing progress bar of the audio work, where the playing progress bar also displays segment marks of each segment of the audio work for the user to operate. The audio detail interface 920 further includes different versions of the audio work and corresponding transaction values, and the user may select the corresponding versions to perform online transactions on the audio work as required, as shown in fig. 9 (c) and 9 (d).

In the above embodiment, the terminal provides a quick and efficient audio searching experience for the seeking-type user by directly playing the preferential listening clip and displaying the segmentation mark on the playing progress bar, and in addition, sets four different operations on the audio playing interface according to four different psychologics of the seeking-type user, so that the seeking-type user can operate according to the needs, the matching efficiency of the audio works and the seeking-type user is improved, the seeking-type user can better utilize the fragmented time to find inspiration, and the man-machine interaction experience between the user and the terminal is improved.

In the embodiment of the application, an implementation manner for screening the audio work is also provided for the purpose-specific user. The purpose-specific user is a user who is relatively specific to the creator of the sought audio work, the name of the work, the style, the trading value, or the emotion conveyed by the work. Such users can seek out an audio work of interest by simply entering a search term or selecting a filter tag.

In one embodiment, the audio playing method further comprises: and displaying the audio search interface of the audio work, and responding to the input of the search word in the search box of the audio search interface, and displaying the audio work matched with the search word in the search result area of the audio search interface.

The audio search interface is a user interaction interface for searching for an audio work. The audio search interface comprises a search box, and the terminal can acquire the search words input in the search box and display the audio works matched with the search words in a search result area of the audio search interface.

In one embodiment, the audio playing method further comprises: and responding to the selection of the target fragment type from a plurality of preset fragment types displayed on the audio search interface, and displaying the audio work which is matched with the search word and comprises the fragments of the target fragment type in a search result area of the audio search interface.

The foregoing describes that when a producer or owner of an audio work uploads the audio work, the audio work may be divided into a plurality of pieces and the piece type of each piece may be submitted. In order to enable the user with definite purposes to more accurately match the audio works, the terminal can display the preset fragment types in an audio search interface, the user can select the fragment type of the preferential listening fragment during screening, and then label screening is carried out, so that the searched audio works are the audio works with the fragments of the fragment type.

FIG. 10 is an interface diagram illustrating an audio search interface, in one embodiment. Referring to fig. 10 (a), in the audio search interface 1000, a search box 1002 is provided for a user to input a search word, and a selection box 1004 for preferentially listening to a clip is provided for the user to select a target clip type from a plurality of preset clip types. After the user clicks the "listen preferentially" selection frame 1004, a plurality of preset fragment types are popped up, where the plurality of preset fragment types include: intro, chorus, pre-Chorus, hook, bridge, outro, upon selection of "Hook" by the user, referring to section (b) of FIG. 10, in search results area 1006, an audio work that matches the search term and includes the "Hook" segment is presented.

In one embodiment, the audio playing method further comprises: and responding to the triggering operation of the search result area for viewing the audio details of the target audio works, displaying an audio detail interface of the target audio works, and automatically playing the fragments of the target fragment types in the target audio works in the audio detail interface.

In this embodiment, for an audio work that is displayed and matches with a search term and includes a segment of a target segment type, a user may select a certain audio work from the audio work, enter an audio detail interface of the audio work, and in the audio detail interface, play the segment of the target segment type in the audio work preferentially for the user, that is, for the searched audio work, default to play from the segment corresponding to the segment type selected by the user.

In order to enable the user with definite purposes to match the audio works more accurately, the terminal can display a screening label in the audio search interface, and the user can screen out the matched audio works by selecting the label.

In one embodiment, the audio playing method further comprises: and responding to the selection of the target label from a plurality of preset screening labels displayed on the audio search interface, and displaying the audio works matched with the target label in a search result area of the audio search interface.

Wherein, the plurality of preset screening tags can be roughly divided into: "music theory label", "demand label", "reference label". "music tags" include tags of music knowledge aspects related to an audio work, which may include "style", "BPM" (Beats Per Minute), "emotion", and so on. "demand labels" include labels in terms of user purchasing demand dimensions, which may include: "authorized version", "audio duration", "transaction price", etc. The "reference label" is derived based on data that the audio work was purchased, and may be various sort labels, such as: "comprehensive ranking", "play amount", "collection amount", "praise amount", "sales amount", "latest upload", "price ranking", and the like.

It will be appreciated that the target labels selected by the user may include a plurality of labels, and if a plurality of labels are selected from the same class of filter labels, such as the selected "style" includes "hip-hop" and "R & B", then the audio work presented by the terminal in the search result area should be an audio work that matches any of the plurality of target labels. If multiple labels are selected from different screening labels, if the selected style is "hip hop" and the BPM is "100-120", the displayed audio works need to meet the two screening labels simultaneously.

As shown in fig. 11, a schematic diagram of a screening tag in one embodiment. Referring to fig. 11, part (a) is a "style" tag, such as hip-hop, trap, R & B, etc. Referring to fig. 11 (b), a "BPM" label, for example, the BPM value may be between 0 and 300, and the user may select as desired, such as selecting 100 to 180. Referring to fig. 11, part (c), an "authorized" tag, for example: an audition version, a basic authorization version, a WAVE (a standard digital audio file) value-added version, a track-divided value-added version, a permanent authorization version, and so forth. Referring to fig. 11 (d), a "duration" label, for example, the duration may be given as: within one minute, 1-3 minutes, 3-5 minutes, 5-8 minutes, 8-10 minutes, 10 minutes or more, etc. The "sort" tag in FIG. 11 may include, for example: comprehensive ordering, play volume, collection volume, praise volume, sales volume, latest upload, price ordering, and the like.

In one embodiment, the audio playing method further comprises: the method comprises the steps of responding to triggering operation of viewing audio details of a target audio work in a search result area, displaying an audio detail interface of the target audio work, wherein the audio detail interface comprises transaction values of the target audio work, responding to triggering operation of audio transaction in the audio detail interface, displaying a transaction interface of the target audio work, responding to triggering operation of submitting an order in the transaction interface, and generating a transaction order of the target audio work based on the transaction values.

The terminal can respond to the triggering operation, such as clicking operation, of seeking to explore the card area corresponding to the currently played audio work, display the audio detail interface of the audio work, and support the user to further operate to generate a trade order. In this embodiment, the terminal may also display an audio detail interface of the target audio work in response to a triggering operation of the target audio work displayed in the search result area by the user with explicit purpose, and similarly, support the user to further operate in the audio detail interface to generate a trade order.

FIG. 12 is a schematic diagram of an interface change from an audio search interface to an audio detail interface, in one embodiment. Referring to part (a) of fig. 12, for a target audio work 1204 displayed in a search result area 1202 of the audio search interface 1200, a user may enter an audio detail interface 1220 of the target audio work by clicking operation, and as shown in part (b) of fig. 12, the audio detail interface 1220 may include a playing progress bar of the audio work, on which a segment mark of each segment of the audio work is also displayed, for the user to operate. Also included in the frequency details interface 1220 are different versions of the audio work and corresponding trading values, which the user may select to trade the audio work online as desired, as shown in FIG. 12 (c) and FIG. 12 (d).

In one embodiment, an audio playback interface for presenting a pushed audio work, comprises: displaying an audio playing interface of the pushed audio work in a child application, wherein the child application is an application program running in a parent application, and the parent application is associated with a target identifier; after generating the trade order of the currently played audio work based on the trade value, the audio playing method further comprises the following steps: and carrying out resource transfer operation on the transaction account number associated with the target identifier according to the transaction numerical value of the audio work included in the transaction order.

In this embodiment, a process of searching for an audio work of interest of a user and performing a transaction is implemented through a sub-application running on a parent application, screening of the audio work of the transaction can be implemented through the sub-application, and payment of the audio work of the transaction is implemented through a transaction account number associated with a target identifier of a current login parent application, so that convenience in transaction of the audio work is improved through the whole process.

In one embodiment, the terminal or server may also automatically identify the preferred auditory phrase in the audio work by means of a trained preferred auditory phrase prediction model. For example, when a user uploads an audio work, the preferential listening clip can be used for automatically identifying the preferential listening clip in the preferential listening clip for confirmation of the user, and when the user confirms the uploading, the starting and ending time points of the preferential listening clip in the whole audio work are submitted to a server together and stored by the server. For another example, when the user fails to mark the preferential auditory phrase when uploading the audio work, the server may automatically identify the preferential auditory phrase therein through the trained preferential auditory phrase prediction model, and record the start and stop time points of the preferential auditory phrase in the entire audio work.

In one embodiment, the step of generating the preferred auditory phrase in the audio work includes: dividing the audio work into a plurality of equal-length fragments, respectively carrying out classification prediction on each fragment of the audio work through a trained preferential listening fragment prediction model to obtain a prediction result of whether each fragment belongs to the preferential listening fragment, and merging a plurality of continuous preferential listening fragments in the audio work according to the prediction result to obtain the preferential listening fragment in the audio work.

The trained preferential auditory segment prediction model is a model based on a neural network, and because a training sample processed by the preferential auditory segment prediction model in training is an audio segment, when predicting an audio work, a terminal needs to divide the audio work into a plurality of segments with equal length, and for each segment, classification prediction is performed through the preferential auditory segment prediction model to obtain prediction results, wherein the prediction results have two types: belongs to the preferential listening segment and does not belong to the preferential listening segment.

In one embodiment, the terminal may combine successive pieces of the preferred listening clip in the audio work as the preferred listening clip in the audio work. Alternatively, the terminal may incorporate the longest consecutive preferred auditory phrase into the resulting phrase as a preferred auditory phrase in the audio work. That is, the terminal determines a portion of the audio piece for which a longest succession is classified as a preferred-to-listen piece, and takes the portion as the preferred-to-listen piece in the audio work. Optionally, the terminal may further correct the discontinuous preferred listening segment, so as to improve accuracy of the prediction result, and for a target segment that is predicted by the model to not belong to the preferred listening segment, if a plurality of continuous segments located before the target segment are preferred listening segments and a plurality of continuous segments located after the target segment are preferred listening segments, correct the classification result of the target segment, and determine that the target segment belongs to the preferred listening segment.

For example, the terminal divides a piece of audio work in 5 seconds, and the total length of the audio work is 8 seconds, and the piece of audio work is divided into 4 pieces, namely, a piece of 0-5 seconds, a piece of 1-6 seconds, a piece of 2-7 seconds, and a piece of 3-8 seconds. The four segments are classified by the listen-ahead prediction model as belonging to listen-ahead segments, respectively. If the segments of 1-6 seconds and the segments of 2-7 seconds belong to the preferential listening segments, the 1-7 seconds after merging is the preferential listening segment in the audio work.

For another example, the terminal divides a piece of audio work in 5 seconds, and the total length of the audio work is 8 seconds, and the piece of audio work is divided into 4 pieces, namely, a piece of 0-3 seconds, a piece of 3-6 seconds, and a piece of 7-8 seconds. The 3 segments are classified by the listen-ahead prediction model as belonging to listen-ahead segments, respectively. If the segments of 0-3 seconds and 3-6 seconds belong to the preferential listening segments, the 0-6 seconds after merging is the preferential listening segment in the audio work.

For another example, an audio work is divided into 15 segments, and the classification prediction result corresponding to each segment is 0,0,1,1,1,1,1,0,1,1,1,1,1,0,0, where 1 indicates that it belongs to a preferred listening segment, 0 indicates that it does not belong to a preferred listening segment, where the prediction result of the 8 th segment indicates that it does not belong to a preferred listening segment, and the consecutive 5 segments located after and before it belong to a preferred listening segment, and then the prediction result of the 8 th segment may be modified to be: belonging to the preferential listening section, and combining the 3 rd section to the 13 th section to obtain a result which is used as the preferential listening section in the audio work. That is, when there is a priority auditory phrase N times across the left and right of a target segment predicted to be not belonging to the priority auditory phrase, the target is corrected to the priority auditory phrase. The value of N may be determined according to the actual effect, which is not limited in this embodiment. For example, N may be 5.

In one embodiment, the classification prediction is performed on each segment of the audio work through a trained preferential auditory segment prediction model to obtain a prediction result of whether each segment belongs to the preferential auditory segment, including: converting the segments into corresponding spectrograms through a trained preferential auditory segment prediction model, extracting image features of the spectrograms, and outputting a prediction result of whether the segments belong to the preferential auditory segments or not based on the image features.

In one embodiment, the trained preferential auditory phrase prediction model includes a preprocessing layer, a feature extraction layer, and a classifier, wherein the preprocessing layer is configured to generate a spectrogram of the phrase, and the preprocessing layer is further configured to regenerate the spectrogram after eliminating the noise in the phrase. The complexity of the audio is reduced through message noise and noise, so that the calculation efficiency of the model is improved, and the influence of irrelevant audio on the judgment capability of the model is reduced. The audio fragment is converted into a spectrogram, and the spectrogram is used as a picture to be input into a subsequent model for feature extraction and classification prediction. The feature extraction layer mainly comprises a convolution layer and a pooling layer connected with the convolution layer, and is used for extracting image features of a spectrogram, the classifier comprises a full-connection layer and a dropout layer (used for random suppression of neurons), and a final prediction result is output based on the image features.

FIG. 13 is a flow diagram of predicting preferred auditory phrase in an audio work, according to one embodiment. Referring to fig. 13, first, an audio work is divided into a plurality of pieces. And sequentially processing each segment by using a trained preferential auditory segment prediction model, eliminating noise in the segment by a preprocessing layer of the preferential auditory segment prediction model, generating a spectrogram of the segment, inputting the spectrogram into a feature extraction layer of the preferential auditory segment prediction model to obtain image features of the spectrogram, and finally outputting a prediction result corresponding to the segment by a classifier. And finally, determining the longest subsequence formed by the longest continuous preferential listening fragment in the audio work based on the prediction result of each fragment, and merging the continuous longest preferential listening fragments corresponding to the continuous longest subsequence to obtain the preferential listening fragment in the audio work.

FIG. 14 is a diagram of a network structure of a preferred auditory phrase prediction model, in one embodiment. Referring to fig. 14, the preferential auditory phrase prediction model includes a preprocessing layer, a feature extraction layer, and a classifier. The input of the feature extraction layer is a spectrogram of the segment. Referring to fig. 14, the feature extraction layer may include two convolution layers, and two pooling layers, where the convolution layers and the pooling layers are sequentially and alternately connected, the two convolution layers are used to extract features of the spectrogram, the first pooling layer is a maximum pooling layer for expanding feature values and removing redundant information, and the second pooling layer is an average pooling layer for reducing neighborhood variance in the picture. The feature extraction layer also comprises two full-connection layers and a Dropout layer, and the result output by the last full-connection layer is used as the input of the classifier and used for outputting the prediction result.

In one embodiment, the training step of the preferential auditory phrase prediction model includes: dividing sample audio into a plurality of equal-length sample fragments, determining whether each sample fragment belongs to labeling information of a preferential listening fragment, respectively inputting the sample fragments into a preferential listening fragment prediction model, converting the sample fragments into corresponding spectrograms, extracting image features of the spectrograms, outputting the prediction probability of the sample fragment belonging to the preferential listening fragment based on the image features, calculating cross entropy loss based on the labeling information of the sample fragment and the prediction probability of the sample fragment belonging to the preferential listening fragment, and updating model parameters of the preferential listening fragment prediction model according to the cross entropy loss to obtain a trained preferential listening fragment prediction model.

The labeling information of the sample audio is generated according to preferential hearing segments which are labeled by a producer or all persons of the sample audio, specifically, in the sample audio, segments which belong to preferential hearing segments labeled by the producer or all persons, segments which do not belong to preferential hearing segments labeled by the producer or all persons, and segments which do not belong to preferential hearing segments.

For example, the total length of the sample audio work is 8 seconds, and the preferential auditory phrase noted by the producer or owner is 1 st to 7 th seconds.

If the audio work is divided into 4 segments in units of 5 seconds, namely, a segment of 0 th to 5 th seconds, a segment of 1 st to 6 th seconds, a segment of 2 nd to 7 th seconds and a segment of 3 rd to 8 th seconds, the labeling information of each of the 4 segments is as follows: the values of the audio segments may be expressed as 0,1, and 0, respectively.

If the audio work is divided into 3 segments in 3 seconds, namely, a segment of 0 th to 3 seconds, a segment of 3 rd to 6 seconds and a segment of 6 th to 8 seconds, the labeling information of each of the 3 segments is as follows: the audio clips are not prioritized, are not prioritized, and can be expressed as 0,1, and 0. Wherein 1 indicates a priority auditory phrase, and 0 indicates a non-priority auditory phrase.

In the training stage, after the network structure of the preferential auditory segment prediction model is set, the sample segments segmented in each sample audio are taken as processing objects and sequentially input into the preferential auditory segment prediction model, and the segments are processed through the preferential auditory segment prediction model, so that the prediction probability that the sample segments belong to the preferential auditory segments is obtained. For example, the prediction probability that the sample segment belongs to the preferential auditory segment is obtained through the processing of a preprocessing layer, a feature extraction layer and a classifier of the preferential auditory segment prediction model. And calculating cross entropy loss based on the difference between the prediction probability and the labeling information of the sample fragment, updating the model parameters of the priority listening fragment prediction model according to the cross entropy loss, and continuing training according to the updated model parameters until the training stopping condition is met, and obtaining a trained priority listening fragment prediction model when the iteration number reaches the preset number or the prediction accuracy accords with the preset target. The trained preferential auditory phrase prediction model may be used to predict preferential auditory phrases in an audio work.

In one embodiment, the triggering operation of searching audio in the audio search interface triggers a search request, and the search request is divided into fuzzy search and accurate search according to whether the search request carries search words. That is, when the user inputs the name of the audio work, an accurate search is triggered, and when the user triggers a search request by selecting a filter tag, a fuzzy search is triggered.

The data corresponding to the fuzzy search and the accurate search are stored in different databases. FIG. 15 is a schematic diagram of a data storage system for an audio work in one embodiment. Referring to FIG. 15, the data storage system includes a relational database Mysql, a cache Redis, and a search database elastic search. The Mysql is stored in the hard disk, and has relatively low cost and is used for storing the whole audio data. Redis is a cache stored in a memory, has high query speed, is used for storing high-frequency entry data such as names of audio works, and supports accurate search. The elastiscearch serves as a search engine of Mysql, stores a large number of filter tags, and supports fuzzy search. Redis is used in combination with elastic search to improve the efficiency of searching and querying audio data from a database. When a producer or all of the audio works upload the audio works, data (such as audio files, audio information, preferential auditory phrase information, segmentation information and the like) related to the audio works are stored in a relational database Mysql, when the data of binlog in the relational database Mysql changes, redis is informed to synchronously update the corresponding data in a Kafka message queue mode, and meanwhile, the binlog of Mysql is monitored through an open source library go-Mysql-elastic search, and when the data of binlog in the Mysql changes, the corresponding data in the elastic search is synchronously updated.

Referring to fig. 15, when a user initiates a search request, different processes are performed according to the search type of the search request. When the search type of the search request is accurate search, namely when a user inputs a search word related to the audio work, the server firstly judges whether the audio work taking the search word as a work name exists in a cache Redis, if yes, the server directly inquires and returns a corresponding search result from the cache Redis, if no audio work taking the search word as the work name exists in the cache Redis, the server further inquires from a relational database Mysql, returns the inquired search result matched with the search word to the terminal, and synchronously updates the inquired search result matched with the search word into the cache Redis through a Kafka message queue. When the search type of the search request is fuzzy search, if a user selects a plurality of screening labels, the search is directly triggered, the server firstly inquires whether audio works matched with the screening labels exist in the search type database elastic search, if yes, the server directly returns, if not, the server further inquires from the relational database Mysql, the inquired search results matched with the screening labels are returned to the terminal, and the inquired search results are synchronously updated into the search type database elastic search through go-Mysql-elastic search.

The buffer Redis is stored in the memory, the problem that Mysql query from a hard disk is delayed due to the fact that a large number of I/O operations are needed is solved by directly searching the buffer Redis, the problem that Mysql is needed to be traversed in full for fuzzy search is avoided by an internal positive-displacement and negative-displacement construction method by means of elastic search, and the search and query speeds of data are improved by combining the two parts.

In a specific embodiment, the audio playing method specifically may include the following steps:

the server acquires the uploaded audio work, and when a producer or all of the audio work does not mark the audio work in a segmented manner when uploading the audio work, the server can divide the audio work into a plurality of segments by using an audio segmentation model and obtain the starting and ending time points of each segment. The server may also predict a priority listening segment in the audio work from the segment using a priority segment prediction model, and combine consecutive priority listening segments to obtain a start-stop time point corresponding to the priority listening segment. When predicting the preferential listening clips, the server divides the audio work into a plurality of equal-length clips, respectively performs classification prediction on each clip of the audio work through a trained preferential listening clip prediction model to obtain a prediction result of whether each clip belongs to the preferential listening clip, and combines a plurality of continuous preferential listening clips in the audio work according to the prediction result to obtain the preferential listening clip in the audio work.

The terminal logs in the parent application with the target mark based on the user operation, enters the child application running in the parent application, displays the audio playing interface of the pushed audio work in the child application, and plays the preferential listening clip in the pushed audio work in the audio playing interface.

In addition, the terminal can display the segment marks of all the segments in the pushed audio work on the playing progress bar of the audio playing interface, all the segments in the pushed audio work comprise the preferential listening segments, all the segments have corresponding segment types, the terminal responds to the triggering operation of the target segment marks on the playing progress bar to stop playing the preferential listening segments, and the target segments matched with the target segment marks in the pushed audio work begin to be played.

In addition, the terminal can display a card area corresponding to the current played audio work in the audio playing interface, stop playing the preferential listening clip in the current played audio work in response to the triggering operation of switching the audio in the card area, play the preferential listening clip in the next pushed audio work, and remove the card area corresponding to the current played audio work from the audio playing interface so as to gradually display the card area corresponding to the next pushed audio work.

In addition, the terminal can also respond to the triggering operation of collecting the audio in the audio playing interface and add the currently played audio works to the audio collection list of the target account, so that the terminal can respond to the triggering operation of the audio collection icon in the audio playing interface and display the audio collection interface where the audio collection list is located.

In addition, the terminal can also respond to the triggering operation of checking the audio details in the audio playing interface, display the audio details interface of the audio works which are currently played, display the playing progress bar of the audio works which are currently played in the audio details interface, and display the segment marks of all segments in the audio works which are currently played on the playing progress bar of the audio details interface. And then, the terminal can also respond to the triggering operation of the target segment mark on the playing progress bar of the audio detail interface to stop playing the preferential listening segment, and play the target segment matched with the target segment mark in the currently played audio work.

In addition, the terminal can also respond to the triggering operation of the audio transaction in the audio detail interface, display the transaction interface of the audio work which is currently played, respond to the triggering operation of submitting orders in the transaction interface, generate the transaction order of the audio work which is currently played based on the transaction value, and carry out the resource transfer operation on the transaction account number which is associated with the target mark according to the transaction value of the audio work which is included in the transaction order.

In addition, the terminal can also enter an audio search interface of the audio work, and display the audio work matched with the search word in a search result area of the audio search interface in response to the input of the search word in a search box of the audio search interface.

Further, the terminal may also display, in a search result area of the audio search interface, an audio work that matches the search term and includes a segment of the target segment type in response to selecting the target segment type from a plurality of preset segment types displayed in the audio search interface.

Further, the terminal may also display an audio work matching the target tag in a search result area of the audio search interface in response to selecting the target tag from a plurality of preset filter tags displayed in the audio search interface.

In addition, the terminal can also respond to the triggering operation of the target audio works for viewing the audio details in the search result area, display the audio detail interface of the target audio works, wherein the audio detail interface comprises the transaction numerical value of the target audio works, respond to the triggering operation of the audio transaction in the audio detail interface, display the transaction interface of the target audio works, respond to the triggering operation of submitting orders in the transaction interface, generate the transaction order of the target audio works based on the transaction numerical value, and carry out resource transfer operation on the transaction account number associated with the target identification according to the transaction numerical value of the audio works included in the transaction order.

According to the audio playing method, the preferential listening clips are included in the clips forming the audio work, and the preferential listening clips in the pushed audio work are directly played after entering the audio playing interface of the audio work, wherein the preferential listening clips are the clips which are intended to be heard preferentially by a user in the clips forming the audio work, so that the matching efficiency between the audio work and the user who opens the audio playing interface can be improved, and the user can find the interested audio work as soon as possible. In addition, each segment forming the audio work has a corresponding segment type, and the segment mark of each segment in the audio work is displayed on the playing progress bar of the audio playing interface, so that a user can trigger an operation through the segment mark on the playing progress bar to play the segments of different segment types in the audio work. Therefore, for each pushed audio work, only part of clips need to be played to finish appreciation and listening of the audio work, all clips do not need to be played, and under the condition that the number of the audio works is large, the matching efficiency between the audio works and users who open the audio playing interface can be greatly improved, and the efficiency of searching the audio works from mass audio works is improved.

It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides an audio playing device for realizing the above related audio playing method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of one or more audio playing devices provided below may be referred to the limitation of the audio playing method hereinabove, and will not be repeated here.

In one embodiment, as shown in fig. 16, there is provided an audio playing device 1600 comprising: a play module 1602, a display module 1604, and a clip switch module 1606, wherein:

the playing module 1602 is configured to display an audio playing interface of the pushed audio work, and play a preferential listening clip in the pushed audio work;

the display module 1604 is configured to display, on a playing progress bar of the audio playing interface, a segment mark of each segment in the pushed audio work, where each segment in the pushed audio work includes a priority listening segment, and each segment has a corresponding segment type;

the segment switching module 1606 is configured to play the target segment matching the target segment mark in the pushed audio work in response to the triggering operation on the target segment mark on the playing progress bar.

In one embodiment, the audio playback device 1600 further comprises:

In one embodiment, the audio switching module is further configured to display, in the audio playing interface, a card area corresponding to the currently played audio work; and responding to the triggering operation of switching the audio in the card area, stopping playing the preferential listening section in the currently played audio work, and playing the preferential listening section in the next pushed audio work.

In one embodiment, the audio playing device 1600 further includes a card changing module, configured to, in response to a triggering operation of switching audio in a card area, remove a card area corresponding to a currently played audio work from the audio playing interface; and displaying a card area corresponding to the next pushed audio work in the process of moving out the card where the played audio work is located from the audio playing interface.

In one embodiment, the card changing module is further configured to obtain a current card binary array, where a first element and a second element in the card binary array respectively represent, in an alternating manner, audio information of a currently played audio work and audio information of a next pushed audio work, and the audio information is used to render a card area corresponding to the audio work; when the audio information of the currently played audio work is derived from the first element in the card binary array, responding to the triggering operation of switching the audio in the card area, rendering and updating the card area corresponding to the currently played audio work according to the second element in the card binary array, switching the audio work according to the second element in the card binary array for playing, and updating the first element in the card binary array according to the audio information of the next pushed audio work after the audio information of the next pushed audio work is obtained; when the audio information of the currently played audio work is derived from the second element in the card binary array, responding to the triggering operation of switching the audio in the card area, rendering and updating the card area corresponding to the currently played audio work according to the first element in the card binary array, switching the audio work according to the first element in the card binary array for playing, and updating the second element in the card binary array according to the audio information of the next pushed audio work after the audio information of the next pushed audio work is obtained.

In one embodiment, the audio playback device 1600 further comprises:

the audio work collection module is used for responding to the triggering operation of collecting audio in the audio playing interface and adding the currently played audio work to the audio collection list of the target account; and responding to the triggering operation of the audio collection icon in the audio playing interface, and displaying the audio collection interface where the audio collection list is located.

In one embodiment, the audio playback device 1600 further comprises:

the display module is also used for displaying the playing progress bar of the currently played audio work in the audio detail interface; and displaying the segment marks of all the segments in the currently played audio work on the playing progress bar of the audio detail interface.

In one embodiment, the playing module 1602 is further configured to stop playing the preferred listening clip and play the target clip matching the target clip mark in the currently played audio work in response to a triggering operation of the target clip mark on the playing progress bar of the audio detail interface.

In one embodiment, the audio detail interface includes a trading value for the currently playing audio work; the audio playback apparatus 1600 further includes:

the audio transaction module is used for responding to the triggering operation of the audio transaction in the audio detail interface and displaying the transaction interface of the currently played audio work; and responding to the triggering operation of submitting orders in the trading interface, and generating trading orders of the currently played audio works based on the trading values.

In one embodiment, the play module 1602 is further to:

in response to inputting the target clip type from the audio playback interface, audio information of the pushed audio work is presented at the audio playback interface, and the preferred clips in the pushed audio work are clips of the target clip type.

In one embodiment, the audio playback device 1600 further comprises:

the pushing module is used for acquiring historical priority listening data corresponding to the target account; determining a preference fragment type corresponding to the target account according to the historical priority listening data; pushing the preferred auditory phrase is an audio work of the preferred phrase type.

In one embodiment, the audio playback device 1600 further comprises:

the audio work searching module is used for displaying an audio searching interface of the audio work; in response to entering a search term in a search box of the audio search interface, an audio work matching the search term is presented in a search result area of the audio search interface.

In one embodiment, the audio work searching module is further configured to, in response to selecting the target segment type from a plurality of preset segment types displayed on the audio searching interface, display, in a search result area of the audio searching interface, an audio work that matches the search term and includes a segment of the target segment type.

In one embodiment, the audio playback device 1600 further comprises:

the audio detail viewing module is used for responding to the triggering operation of viewing the audio details of the target audio works in the search result area and displaying an audio detail interface of the target audio works; the audio detail interface comprises a transaction value of the target audio work;

the audio transaction module is used for responding to the triggering operation of the audio transaction in the audio detail interface and displaying the transaction interface of the target audio work; and responding to the triggering operation of submitting the order in the trading interface, and generating a trading order of the target audio work based on the trading value.

the audio playback apparatus 1600 further includes:

In one embodiment, the audio playback device 1600 further comprises:

the preferential listening segment generation module is used for dividing the audio work into a plurality of segments with equal length; respectively carrying out classified prediction on each segment of the audio work through a trained preferential listening segment prediction model to obtain a prediction result of whether each segment belongs to the preferential listening segment; and merging a plurality of continuous preferential listening clips in the audio work according to the prediction result to obtain the preferential listening clips in the audio work.

In one embodiment, the preferential auditory phrase generating module is further configured to convert the phrases into corresponding spectrograms through a trained preferential auditory phrase prediction model, extract image features of the spectrograms, and output a prediction result of whether the phrases belong to the preferential auditory phrase based on the image features.

In one embodiment, the audio playing device 1600 further includes a training module for dividing the sample audio into a plurality of sample segments of equal length and determining labeling information regarding whether each sample segment belongs to a preferred listening segment; respectively inputting the sample fragments into a preferential auditory segment prediction model, converting the sample fragments into corresponding spectrograms through the preferential auditory segment prediction model, extracting image features of the spectrograms, and outputting the prediction probability of the sample fragments belonging to the preferential auditory segments based on the image features; based on the labeling information of the sample fragment and the prediction probability that the sample fragment belongs to the priority auditory canal fragment, calculating cross entropy loss, and updating model parameters of the priority auditory canal fragment prediction model according to the cross entropy loss to obtain a trained priority auditory canal fragment prediction model.

The above-mentioned audio playing device 1600 includes a preferential listening section in each section constituting the audio work, and after entering the audio playing interface of the audio work, the preferential listening section in the pushed audio work is directly played, where the preferential listening section is a section of the section constituting the audio work intended to be heard preferentially by the user, so as to improve the matching efficiency between the audio work and the user who opens the audio playing interface, and enable the user to find the audio work of interest as soon as possible. In addition, each segment forming the audio work has a corresponding segment type, and the segment mark of each segment in the audio work is displayed on the playing progress bar of the audio playing interface, so that a user can trigger an operation through the segment mark on the playing progress bar to play the segments of different segment types in the audio work. Therefore, for each pushed audio work, only part of clips need to be played to finish appreciation and listening of the audio work, all clips do not need to be played, and under the condition that the number of the audio works is large, the matching efficiency between the audio works and users who open the audio playing interface can be greatly improved, and the efficiency of searching the audio works from mass audio works is improved.

The various modules in the audio playback device 1600 described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 17. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an audio playback method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like. The computer device also comprises an input/output interface, wherein the input/output interface is a connecting circuit for exchanging information between the processor and the external device, and the input/output interface is connected with the processor through a bus and is called as an I/O interface for short.

It will be appreciated by those skilled in the art that the structure shown in fig. 17 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application applies, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided, including a memory and a processor, where the memory stores a computer program, and the processor implements the audio playing method provided in any of the embodiments of the present application when executing the computer program.

In one embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the audio playing method provided by any of the embodiments of the present application.

In one embodiment, a computer program product is provided, comprising a computer program that, when executed by a processor, implements the audio playback method provided by any of the embodiments of the present application.

It should be noted that, user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. An audio playing method, characterized in that the method comprises:

2. The method according to claim 1, wherein the method further comprises:

and responding to the triggering operation of switching the audio in the audio playing interface, and playing the preferential listening clip in the next pushed audio work.

3. The method of claim 2, wherein playing the preferred listening clip in the next pushed audio work in response to a triggering operation to switch audio in the audio playback interface comprises:

displaying a card area corresponding to the currently played audio work in the audio playing interface;

and responding to the triggering operation of switching the audio in the card area, stopping playing the preferential listening clip in the currently played audio work, and playing the preferential listening clip in the next pushed audio work.

4. A method according to claim 3, characterized in that the method further comprises:

responding to the triggering operation of switching the audio in the card area, and moving out the card area corresponding to the currently played audio work from the audio playing interface;

And displaying a card area corresponding to the next pushed audio work in the process of moving out the card where the played audio work is located from the audio playing interface.

5. A method according to claim 3, characterized in that the method further comprises:

acquiring a current card binary array, wherein a first element and a second element in the card binary array respectively represent the audio information of the current played audio work and the audio information of the next pushed audio work alternately, and the audio information is used for rendering a card area corresponding to the audio work;

when the audio information of the currently played audio work is derived from the first element in the card binary array, responding to the triggering operation of switching audio in the card area, rendering and updating the card area corresponding to the currently played audio work according to the second element in the card binary array, switching the audio work according to the second element in the card binary array for playing, and updating the first element in the card binary array according to the audio information of the next pushed audio work after the audio information of the next pushed audio work is obtained;

When the audio information of the currently played audio work is derived from the second element in the card binary array, responding to the triggering operation of switching the audio in the card area, rendering and updating the card area corresponding to the currently played audio work according to the first element in the card binary array, switching the audio work according to the first element in the card binary array for playing, and updating the second element in the card binary array according to the audio information of the next pushed audio work after the audio information of the next pushed audio work is obtained.

6. The method according to claim 1, wherein the method further comprises:

responding to the triggering operation of collecting audio in the audio playing interface, and adding the currently played audio works to an audio collection list of a target account;

and responding to the triggering operation of the audio collection icon in the audio playing interface, and displaying the audio collection interface where the audio collection list is located.

7. The method according to claim 1, wherein the method further comprises:

responding to the triggering operation of checking the audio details in the audio playing interface, and displaying the audio details interface of the currently played audio work;

Displaying a playing progress bar of the currently played audio work in the audio detail interface;

and displaying the segment marks of all the segments in the currently played audio work on the playing progress bar of the audio detail interface.

8. The method of claim 7, wherein the method further comprises:

and responding to the triggering operation of the target segment mark on the playing progress bar of the audio detail interface, stopping playing the preferential listening segment, and playing the target segment matched with the target segment mark in the currently played audio work.

9. The method of claim 7, wherein the audio detail interface includes a trading value for the currently playing audio work; the method further comprises the steps of:

responding to the triggering operation of the audio transaction in the audio detail interface, and displaying the transaction interface of the currently played audio work;

and responding to the triggering operation of submitting orders in the trading interface, and generating the trading orders of the currently played audio works based on the trading values.

10. The method according to claim 1, wherein the method further comprises:

11. The method according to claim 1, wherein the method further comprises:

acquiring historical priority listening data corresponding to a target account;

determining a preference fragment type corresponding to the target account according to the historical priority listening data;

pushing the preferential listening clip as the audio work of the preference clip type.

12. The method according to claim 1, wherein the method further comprises:

an audio search interface for displaying the audio work;

and responding to the input of the search word in the search box of the audio search interface, and displaying the audio works matched with the search word in the search result area of the audio search interface.

13. The method according to claim 12, wherein the method further comprises:

and responding to the selection of a target fragment type from a plurality of preset fragment types displayed on the audio search interface, and displaying the audio works which are matched with the search word and comprise the fragments of the target fragment type in a search result area of the audio search interface.

14. The method according to claim 12, wherein the method further comprises:

and responding to a target label selected from a plurality of preset screening labels displayed on the audio searching interface, and displaying the audio works matched with the target label in a searching result area of the audio searching interface.

15. The method according to claim 12, wherein the method further comprises:

responding to the triggering operation of viewing the audio details of the target audio works in the search result area, and displaying an audio detail interface of the target audio works; the audio detail interface comprises a transaction numerical value of the target audio work;

responding to the triggering operation of the audio transaction in the audio detail interface, and displaying the transaction interface of the target audio work;

and responding to the triggering operation of submitting orders in the trading interface, and generating the trading orders of the target audio works based on the trading values.

16. The method of claim 9 or 15, wherein the presenting the audio playback interface of the pushed audio work comprises:

displaying an audio playing interface of the pushed audio work in a child application, wherein the child application is an application program running in a parent application, and the parent application is associated with a target identifier;

The method further comprises the steps of:

and carrying out resource transfer operation on the transaction account number associated with the target identifier according to the transaction numerical value of the audio work included in the transaction order.

17. The method of any one of claims 1 to 15, wherein the step of generating the preferred auditory phrase in the audio work comprises:

dividing the audio work into a plurality of equal-length segments;

respectively carrying out classification prediction on each segment of the audio work through a trained preferential auditory segment prediction model to obtain a prediction result of whether each segment belongs to the preferential auditory segment;

and merging a plurality of continuous preferential listening clips in the audio work according to the prediction result to obtain the preferential listening clips in the audio work.

18. The method of claim 17, wherein the classifying and predicting each segment of the audio work by using the trained preferential auditory segment prediction model to obtain a prediction result of whether each segment belongs to the preferential auditory segment comprises:

and converting the segment into a corresponding spectrogram through the trained preferential auditory segment prediction model, extracting image characteristics of the spectrogram, and outputting a prediction result of whether the segment belongs to the preferential auditory segment or not based on the image characteristics.

19. The method of claim 16, wherein the step of training the preferred auditory phrase prediction model comprises:

dividing the sample audio into a plurality of sample fragments with equal length, and determining whether each sample fragment belongs to the labeling information of the preferential listening fragment;

respectively inputting a sample fragment into a preferential auditory segment prediction model, converting the sample fragment into a corresponding spectrogram through the preferential auditory segment prediction model, extracting image characteristics of the spectrogram, and outputting the prediction probability of the sample fragment belonging to the preferential auditory segment based on the image characteristics;

and calculating cross entropy loss based on the labeling information of the sample fragment and the prediction probability that the sample fragment belongs to the preferential auditory segment, and updating model parameters of the preferential auditory segment prediction model according to the cross entropy loss to obtain a trained preferential auditory segment prediction model.

20. An audio playback device, the device comprising:

21. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 19 when the computer program is executed.

22. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 19.

23. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 19.