[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN116405713A - Audio recommendation method, device, medium and computing equipment - Google Patents

Audio recommendation method, device, medium and computing equipment Download PDF

Info

Publication number
CN116405713A
CN116405713A CN202310377279.4A CN202310377279A CN116405713A CN 116405713 A CN116405713 A CN 116405713A CN 202310377279 A CN202310377279 A CN 202310377279A CN 116405713 A CN116405713 A CN 116405713A
Authority
CN
China
Prior art keywords
audio
recommended
information
text
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310377279.4A
Other languages
Chinese (zh)
Inventor
章行
肖强
解忠乾
穆学锋
曹鲁豫
李佳文
周阳
刘卉芸
张壮
潘一飞
吕鸿鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Netease Cloud Music Technology Co Ltd
Original Assignee
Hangzhou Netease Cloud Music Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Netease Cloud Music Technology Co Ltd filed Critical Hangzhou Netease Cloud Music Technology Co Ltd
Priority to CN202310377279.4A priority Critical patent/CN116405713A/en
Publication of CN116405713A publication Critical patent/CN116405713A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/25Management operations performed by the server for facilitating the content distribution or administrating data related to end-users or client devices, e.g. end-user or client device authentication, learning user preferences for recommending movies
    • H04N21/251Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/635Filtering based on additional data, e.g. user or group profiles
    • G06F16/636Filtering based on additional data, e.g. user or group profiles by using biological or physiological data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4668Learning process for intelligent management, e.g. learning user preferences for recommending movies for recommending content, e.g. movies

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Physiology (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Embodiments of the present disclosure provide an audio recommendation method, apparatus, medium, and computing device, the method comprising: acquiring audio information of audio to be recommended and target information of a user, wherein the target information comprises at least one of the current emotion, the current time period and the current position of the user; generating recommended voice according to the audio information and the target information; and responding to the voice introduction operation of the audio to be recommended, and playing the recommended voice. In the method, the recommended voice is generated through the audio information of the audio to be recommended, the current emotion of the user, the current time period, the current position and other information, so that the played recommended voice is matched with the emotion of the user in the place and the current time, the user is interested in the audio to be recommended, and the recommending effect of the audio to be recommended is improved.

Description

Audio recommendation method, device, medium and computing equipment
Technical Field
Embodiments of the present disclosure relate to the field of audio playback technology, and more particularly, embodiments of the present disclosure relate to an audio recommendation method, apparatus, medium, and computing device.
Background
This section is intended to provide a background or context for embodiments of the present disclosure. The description herein is not admitted to be prior art by inclusion in this section.
With the development of network technology, audio producers can greatly promote produced audio through a network, so that a large amount of audio appears on the market.
The popularization and promotion of the audio cannot be separated from the automatic recommendation of the audio. In the recommending process of the audio, pushing the audio to an audio interface, and displaying information such as a producer, a starter and the like of the audio on the audio interface. The information displayed by the audio interface is difficult to attract users, so that the recommendation effect of the audio is poor, and the listening continuity and the audio understanding experience of the users are poor.
Disclosure of Invention
The disclosure provides an audio recommendation method, an audio recommendation device, a medium and a computing device, which are used for solving the problem of poor audio recommendation effect.
In a first aspect of embodiments of the present disclosure, there is provided an audio recommendation method, including: acquiring audio information of audio to be recommended and target information of a user, wherein the target information comprises at least one of the current emotion, the current time period and the current position of the user; generating recommended voice according to the audio information and the target information; and responding to the voice introduction operation of the audio to be recommended, and playing the recommended voice.
In an embodiment of the disclosure, the generating recommended voice according to the audio information and the target information includes: determining whether each audio to be recommended is the same type of audio according to the audio information of each audio to be recommended; responding to the audio to be recommended which is the same type, and acquiring a theme text corresponding to the type to which each audio to be recommended belongs; generating a first recommended text according to the target information and the subject text; and converting the first recommended text into recommended voice.
In another embodiment of the present disclosure, the generating the first recommended text according to the target information and the subject text includes: acquiring preset information corresponding to the user, wherein the preset information comprises at least one of preset emotion, preset time period and preset position; and responding to the target information to match the preset information, and generating a first recommended text according to the target information and the subject text.
In another embodiment of the present disclosure, after the obtaining the preset information corresponding to the user, the method further includes: and converting the theme text into recommended voice in response to the target information not matching the preset information.
In another embodiment of the present disclosure, after determining whether each of the audio to be recommended is the same type of audio according to the audio information of each of the audio to be recommended, the method further includes: and generating a second recommended text according to the target information in response to the audio to be recommended is not the same type of audio, and converting the second recommended text into recommended voice.
In another embodiment of the present disclosure, the generating the second recommended text according to the target information includes: acquiring preset information corresponding to the user, wherein the preset information comprises at least one of preset emotion, preset time period and preset position; and responding to the target information to match the preset information, and generating a second recommended text according to the target information.
In another embodiment of the present disclosure, after the obtaining the preset information corresponding to the user, the method further includes: and determining the universal voice as the recommended voice in response to the target information not matching the preset information.
In another embodiment of the present disclosure, before obtaining the subject text corresponding to the type to which each of the audio to be recommended belongs, the method further includes: acquiring a curved wind keyword corresponding to a target type and description information of the target type, wherein the target type is the type to which each audio to be recommended belongs; and generating a theme text corresponding to the target type according to the wind keyword and the description information.
In another embodiment of the present disclosure, the generating, according to the wind keyword and the description information, the subject text corresponding to the target type includes: and inputting the curved wind keywords and the description information into a text generation model to obtain a subject text corresponding to the target type output by the text generation model.
In another embodiment of the present disclosure, the generating, according to the wind keyword and the description information, the subject text corresponding to the target type includes:
generating a text to be corrected according to the curved wind keywords and the description information;
outputting the text to be repaired and prompt information, wherein the prompt information is used for indicating to correct the text to be repaired; and determining the corrected text to be repaired as the text of the subject corresponding to the target type.
In another embodiment of the present disclosure, before determining whether each of the audio to be recommended is the same type of audio according to the audio information of each of the audio to be recommended, the method further includes: acquiring audio preference information of the user, and determining each audio to be determined of the user preference according to the audio preference information; ranking the user preference of each audio to be determined; and determining each audio to be recommended in the sequenced audio to be determined.
In another embodiment of the present disclosure, the determining each audio to be recommended in the sequenced audio to be determined includes: determining each first audio in the ordered audio to be determined, wherein the user preference of the first audio is higher than the user preference of other audio, and the other audio is the audio to be determined which is not determined as the first audio; obtaining target parameters of each first audio, wherein the target parameters comprise at least one of hit rate, click rate, estimated playing time length and estimated collection rate of the first audio; and determining the audio to be recommended according to the first audio corresponding to the target parameter which is larger than a preset threshold.
In another embodiment of the present disclosure, the determining the audio to be recommended according to the first audio corresponding to the hit rate greater than a preset threshold includes: determining third audio with the same attribute from each second audio, wherein the second audio is the first audio corresponding to the target parameter which is larger than a preset parameter, and the attribute comprises at least one of a producer, an audio identifier and a type; and deleting at least one audio from the third audios with the same attribute of each type to obtain the audio to be recommended.
In another embodiment of the present disclosure, the generating recommended voices according to the audio information and the target information includes: acquiring comment information, introduction information and word song information of the audio to be recommended according to the audio information; generating an introduction text of the audio to be recommended according to the comment information, the introduction information and the word stock information; and modifying the introduction text according to the target information to obtain a third recommended text, and converting the third recommended text to obtain recommended voice.
In another embodiment of the present disclosure, the generating the introduction text of the audio to be recommended according to the comment information, the introduction information, and the word stock information includes: and inputting the comment information, the introduction information and the word stock information into a comment generation model to obtain an introduction text of the audio to be recommended, which is output by the comment generation model.
In another embodiment of the present disclosure, the recommended voices include detailed recommended voices and simple recommended voices, and the playing the recommended voices includes: acquiring a playing mode of current operation; responding to the playing mode as a detailed description mode, and playing the detailed recommended voice; and in response to the playing mode being a simple introduction mode, playing the simple recommended voice.
In another embodiment of the present disclosure, the recommended voices include detailed recommended voices and simple recommended voices, and the playing the recommended voices includes: acquiring audio playing information of the user; responsive to determining that the user plays the audio to be recommended according to the audio play information, playing the simple recommended voice; and playing the detailed recommended voice in response to the fact that the user does not play the audio to be recommended according to the audio playing information.
In another embodiment of the present disclosure, after the playing the recommended voice, the method further includes: acquiring sound waves of the recommended voice in playing; and controlling the image in the display screen to be transformed based on the fluctuation of the sound wave.
In another embodiment of the present disclosure, the playing the recommended voice includes: responding to audio introduction information associated with the audio to be recommended, and improving the playing sequence of the audio to be recommended in a first recommendation list to obtain a second recommendation list, wherein the first recommendation list is a list obtained by sequencing all the audio to be recommended; and sequentially playing the recommended voices of the audio frequencies ordered in the second recommendation list.
In another embodiment of the present disclosure, further comprising: responding to the playing operation of the audio to be recommended, and outputting a playing interface; and responding to clicking operation of a voice introduction function on the playing interface, and playing the recommended voice, wherein the voice introduction operation comprises the clicking operation.
In a second aspect of the embodiments of the present disclosure, there is also provided an audio recommendation apparatus, including: the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring audio information of audio to be recommended and target information of a user, and the target information comprises at least one of the current emotion, the current time period and the current position of the user; the generation module is used for generating recommended voice according to the audio information and the target information; and the playing module is used for responding to the voice introduction operation of the audio to be recommended and playing the recommended voice.
In an embodiment of the disclosure, the generating module includes: the first determining unit is used for determining whether each audio to be recommended is the same type of audio according to the audio information of each audio to be recommended; the first acquisition unit is used for responding to the audio to be recommended which is the same type, and acquiring a theme text corresponding to the type to which each audio to be recommended belongs; the first generation unit is used for generating a first recommended text according to the target information and the theme text; and the first conversion unit is used for converting the first recommended text into recommended voice.
In another embodiment of the present disclosure, the first generating unit includes: a first obtaining subunit, configured to obtain preset information corresponding to the user, where the preset information includes at least one of a preset emotion, a preset time period, and a preset position; the first generation subunit is used for responding to the target information to match the preset information and generating a first recommended text according to the target information and the subject text.
In another embodiment of the present disclosure, the first generating unit further includes: and the conversion subunit is used for converting the theme text into recommended voice in response to the fact that the target information does not match the preset information.
In another embodiment of the present disclosure, the generating module further includes: and the second generation unit is used for responding to the audio that the audio to be recommended is not of the same type, generating a second recommended text according to the target information and converting the second recommended text into recommended voice.
In another embodiment of the present disclosure, the second generating unit includes: the second acquisition subunit is used for acquiring preset information corresponding to the user, wherein the preset information comprises at least one of preset emotion, preset time period and preset position; and the second generation subunit is used for responding to the target information to match the preset information and generating a second recommended text according to the target information.
In another embodiment of the present disclosure, the second generating unit further includes: and the first determining subunit is used for determining the universal voice as the recommended voice in response to the fact that the target information does not match the preset information.
In another embodiment of the present disclosure, the generating module further includes: the second acquisition unit is used for acquiring a curved wind keyword corresponding to a target type and description information of the target type, wherein the target type is the type to which each audio to be recommended belongs; and the second generation unit is used for generating a theme text corresponding to the target type according to the wind keyword and the description information.
In another embodiment of the present disclosure, the second generating unit includes: and the first input subunit is used for inputting the wind keywords and the description information into a text generation model to obtain the theme text corresponding to the target type output by the text generation model.
In another embodiment of the present disclosure, the second generating unit includes: the second generation subunit is used for generating a text to be corrected according to the curved wind keywords and the description information; the output subunit is used for outputting the text to be repaired and prompt information, and the prompt information is used for indicating to correct the text to be repaired; and the second determining subunit is used for determining the corrected text to be repaired as the subject text corresponding to the target type.
In another embodiment of the present disclosure, the generating module further includes: a third obtaining unit, configured to obtain audio preference information of the user, and determine each audio to be determined of the user preference according to the audio preference information; the sorting unit is used for sorting the user preference of each audio to be determined; and the second determining unit is used for determining each audio to be recommended in the sequenced audio to be determined.
In another embodiment of the present disclosure, the second determining unit includes: a third determining subunit, configured to determine, among the sequenced respective audio to be determined, respective first audio, where a user preference of the first audio is higher than a user preference of other audio, where the other audio is an audio to be determined that is not determined as the first audio; the third acquisition subunit is used for acquiring target parameters of each first audio, wherein the target parameters comprise at least one of hit rate, click rate, estimated playing time length and estimated collection rate of the first audio; and the fourth determination subunit is used for determining the audio to be recommended according to the first audio corresponding to the target parameter which is larger than a preset threshold.
In another embodiment of the present disclosure, the fourth determining subunit includes: a determining component, configured to determine third audio with the same attribute from second audio, where the second audio is first audio corresponding to the target parameter that is greater than a preset parameter, and the attribute includes at least one of a producer, an audio identifier, and a type; and the acquisition component is used for deleting at least one audio from the third audios with the same attribute of each type to obtain the audio to be recommended.
In another embodiment of the present disclosure, the generating module includes: a fourth obtaining unit, configured to obtain criticizing information, introduction information and word song information of the audio to be recommended according to the audio information; the third generation unit is used for generating an introduction text of the audio to be recommended according to the comment information, the introduction information and the word stock information; and the modification unit is used for modifying the introduction text according to the target information to obtain a third recommended text, and converting the third recommended text to obtain recommended voice.
In another embodiment of the present disclosure, the third generating unit includes: and the second input subunit is used for inputting the comment information, the introduction information and the word stock information into a comment generation model so as to acquire the introduction text of the audio to be recommended, which is output by the comment generation model.
In another embodiment of the present disclosure, the recommended voices include detailed recommended voices and simple recommended voices, and the playing module includes: a fifth obtaining unit, configured to obtain a currently running play mode; the first playing unit is used for responding to the playing mode as a detailed description mode and playing the detailed recommended voice; and the second playing unit is used for responding to the playing mode being a simple introduction mode and playing the simple recommended voice.
In another embodiment of the present disclosure, the playing module includes: a sixth obtaining unit, configured to obtain audio playing information of the user; the third playing unit is used for responding to the fact that the user plays the audio to be recommended according to the audio playing information, and playing the simple recommended voice; and the fourth playing unit is used for playing the detailed recommended voice in response to the fact that the user does not play the audio to be recommended according to the audio playing information.
In another embodiment of the present disclosure, the playing module further includes: a seventh acquisition unit configured to acquire an acoustic wave of the recommended voice in play; and the control unit is used for controlling the image in the display picture to be transformed based on the fluctuation of the sound wave.
In another embodiment of the present disclosure, the playing module includes: the processing unit is used for responding to the audio introduction information related to the audio to be recommended, improving the play sequence of the audio to be recommended in a first recommendation list to obtain a second recommendation list, wherein the first recommendation list is a list obtained by sequencing all the audio to be recommended; and the fifth playing unit is used for sequentially playing recommended voices of the audios sequenced in the second recommendation list.
In another embodiment of the present disclosure, further comprising: the first output unit is used for responding to the playing operation of the audio to be recommended and outputting a playing interface; and the second output unit is used for responding to the clicking operation of the voice introduction function on the playing interface and playing the recommended voice, and the voice introduction operation comprises the clicking operation.
In a third aspect of embodiments of the present disclosure, there is also provided a medium comprising: computer-executable instructions, which when executed by a processor, are for implementing the audio recommendation method as described above.
In a fourth aspect of embodiments of the present disclosure, there is also provided a computing device comprising:
a memory and a processor;
The memory stores computer-executable instructions;
the processor executes the computer-executable instructions stored by the memory, causing the processor to perform the audio recommendation method as described above.
In the embodiment of the disclosure, the recommended voice is generated through the audio information of the audio to be recommended, the current emotion of the user, the current time period, the current position and other information, so that the played recommended voice is matched with the emotion of the user in the place and the current time, the interest of the user in the audio to be recommended is caused, the recommending effect of the audio to be recommended is improved, and the user can understand the depth interpretation of the audio to be listened through hearing under the condition that the audio playing is not interrupted, so that the song listening experience is greatly improved.
Drawings
The above, as well as additional purposes, features, and advantages of exemplary embodiments of the present disclosure will become readily apparent from the following detailed description when read in conjunction with the accompanying drawings. Several embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings, in which:
fig. 1 schematically illustrates an audio recommendation method application scenario diagram according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow diagram according to an embodiment of the present disclosure;
FIG. 3 schematically illustrates a flow diagram according to another embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow diagram according to yet another embodiment of the present disclosure;
FIG. 5 schematically illustrates a flow diagram according to yet another embodiment of the present disclosure;
FIG. 6 schematically illustrates a flow diagram according to yet another embodiment of the present disclosure;
FIG. 7 schematically illustrates a flow diagram according to an embodiment of the present disclosure;
FIG. 8 schematically illustrates a flow diagram according to yet another embodiment of the present disclosure;
FIG. 9 schematically illustrates a schematic diagram of a program product provided in accordance with an embodiment of the present disclosure;
fig. 10 schematically illustrates a structural diagram of an audio recommendation apparatus provided according to an embodiment of the present disclosure;
fig. 11 schematically illustrates a structural schematic diagram of a computing device provided according to an embodiment of the present disclosure.
In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Detailed Description
The principles and spirit of the present disclosure will be described below with reference to several exemplary embodiments. It should be understood that these embodiments are presented merely to enable one skilled in the art to better understand and practice the present disclosure and are not intended to limit the scope of the present disclosure in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Those skilled in the art will appreciate that embodiments of the present disclosure may be implemented as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
According to an embodiment of the disclosure, an audio recommendation method, an audio recommendation device, an audio recommendation medium and a computing device are provided.
Furthermore, any number of elements in the figures is for illustration and not limitation, and any naming is used for distinction only and not for any limiting sense.
In addition, the data related to the disclosure may be data authorized by the user or fully authorized by each party, and the collection, transmission, use and the like of the data all conform to the requirements of national related laws and regulations, and the embodiments of the disclosure may be mutually combined.
The principles and spirit of the present disclosure are explained in detail below with reference to several representative embodiments thereof.
Summary of The Invention
With the development of network technology, audio producers can greatly promote produced audio through a network, so that a large amount of audio appears on the market.
The inventor of the present disclosure found that the popularization and promotion of audio was not separated from the automatic recommendation of audio. In the recommending process of the audio, pushing the audio to an audio interface, and displaying information such as a producer, a starter and the like of the audio on the audio interface. The information displayed by the audio interface is difficult to attract to the user, resulting in poor audio recommendation.
The inventor of the present disclosure therefore thinks that, by generating recommended voices through audio information of the audio to be recommended, current emotion of the user, current time period, current position and other information, the played recommended voices match the places where the user is located and the emotion of the current time, so that the user's interest in the audio to be recommended is caused, the recommending effect of the audio to be recommended is improved, and the user can understand the depth interpretation of the audio to be listened through hearing without interrupting the audio playing, so that the song listening experience is greatly improved.
Application scene overview
Referring first to fig. 1, fig. 1 is an application scenario schematic diagram of an audio recommendation method according to an embodiment of the disclosure. The audio recommendation apparatus 100 acquires audio information of audio to be recommended and target information of the user 200. The audio information may include information of a producer, production background, type, etc. of the audio to be recommended, and the target information includes one or more of a current emotion, a current time period, and a current location of the user 200. The audio recommendation apparatus 100 generates recommended voices from the target information and the audio information. In a voice introduction operation of the audio to be recommended when the user 200 performs on the audio recommendation apparatus 100, the audio recommendation apparatus 100 plays the recommended voice.
Exemplary method
An audio recommendation method according to an exemplary embodiment of the present disclosure is described below with reference to fig. 2 to 8 in conjunction with the application scenario of fig. 1. It should be noted that the above application scenario is only shown for the convenience of understanding the spirit and principles of the present disclosure, and the embodiments of the present disclosure are not limited in any way in this respect. Rather, embodiments of the present disclosure may be applied to any scenario where applicable.
Referring to fig. 2, fig. 2 is a schematic flow chart schematically illustrating an embodiment of an audio recommendation method according to an embodiment of the disclosure, where the audio recommendation method includes:
step S201, acquiring audio information of the audio to be recommended and target information of the user, wherein the target information includes at least one of a current emotion, a current time period and a current position of the user.
In the present embodiment, the execution subject is an audio recommendation apparatus, and for convenience of description, the apparatus will be referred to as an audio recommendation apparatus hereinafter. The device may be any terminal equipment with audio playing function, for example, the device is a mobile terminal or a computer.
The device acquires audio information of the audio to be recommended and target information of a user. The audio information comprises information such as a production background, a word song, a producer and the like of the audio to be recommended. The target information includes at least one of a current emotion of the user, a current time period, and a current location. The audio information may be retrieved from a database.
The current emotion of the user may be obtained based on the device. The device can collect the current voice of the user, and recognize voice parameters such as the voice speed, the tone and the volume of the user in the voice, so that the current emotion of the user is determined through the voice parameters. For example, if the volume is high, the speech speed is high, and the intonation is high, it may be determined that the current emotion of the user is a negative emotion such as anger, frighten, etc. In addition, the device can input the voice into the emotion recognition model, and the current emotion of the user is output through the emotion recognition model.
The current time period may be a certain time period of the day, for example, the current time period includes morning, noon, afternoon, evening, and; the current time period may also be the month or season in which the current time is located. The device obtains the current time period through the electronic calendar. And the current position may be obtained by a positioning module of the device.
Step S202, recommended voices are generated according to the audio information and the target information.
After obtaining the audio information as well as the target information, the device may generate a recommended speech. For example, the apparatus may generate a background description of the recommended audio in the recommended speech based on the audio information, e.g., a description of an audio producer, a wind of the audio, a background light of the produced audio. The device may then generate a voice of the recommended voices that matches the current emotion of the user, a voice that matches the current time period, or a voice that matches the current location based on the target information. For example, if the current emotion of the user is cheerful, the recommended voice includes an introduction about a cheerful portion in the audio to be recommended, and for example, the recommended voice includes: the music of the instrument B in the A audio is restrained and turned upwards, and the voice part of the cheerful mood of the producer can be transmitted; if the current time period is the morning, the recommended voice includes a sentence related to the morning, for example, the recommended voice includes a background sound of a chira of a bird in the morning; if the current position is a valley, the recommended voice includes an introduction related to the tourist attraction and the audio to be recommended, for example, the recommended voice includes "B audio is a graceful rotation, and if the recommended voice is in the valley, the atmosphere created by B audio can be more personally felt. In addition, the device can find the characters matched with at least two of the current emotion, the current time period and the current position in the database, and then convert the characters into a part of recommended voices.
The target information represents a corresponding aging scene of the user, the audio information represents basic information such as a word song, a background and the like of the audio to be recommended, and the device can generate recommended voice which accords with the current state of the user and can be introduced in detail for the user through the aging scene and the basic information of the audio to be recommended.
In step S203, in response to the voice introduction operation of the audio to be recommended, the recommended voice is played.
After the device generates the recommended voice, the device stores the recommended voice in association with the identification of the audio to be recommended. The device can display information related to the audio to be recommended on the display interface for recommendation, and if the device detects the operation of the user on the information related to the audio to be recommended, namely, the device detects the voice introduction operation with the audio to be recommended, the device plays the recommended voice, so that the user knows the reason of recommending the first audio and the related introduction of the first audio based on the recommended voice, and the user knows the first audio, thereby playing the purpose of attracting the user. In addition, the recommended voice also comprises voices related to the current emotion, the current time period and the current position of the user, so that the first audio can be matched with the current state of the user, and the attraction of the audio to the user is further improved. In addition, if the device is a mobile terminal, text information can be obtained through recommended voice of the audio even when the screen of the device is extinguished to play the audio. The user can know story backgrounds and science popularization knowledge created by the audio through the recommended voice, simultaneously provides accompany value for the user, improves interestingness, realizes co-emotion between the user and the audio, and further improves attraction of the audio to be recommended to the user.
In this embodiment, the recommended voice is generated through the audio information of the audio to be recommended, the current emotion of the user, the current time period, the current position and other information, so that the played recommended voice matches the emotion of the user in the place and the current time, thereby causing the user to be interested in the audio to be recommended, improving the recommending effect of the audio to be recommended, and greatly improving the song listening experience by knowing the depth interpretation of the audio to be listened through hearing under the condition of not interrupting the audio playing.
Referring to fig. 3, fig. 3 is a schematic flow chart schematically illustrating another embodiment of an audio recommendation method according to an embodiment of the disclosure, based on the embodiment illustrated in fig. 2, step S202 includes:
in step S301, it is determined whether each audio to be recommended is the same type of audio according to the audio information of each audio to be recommended.
In this embodiment, the device may recommend multiple audio items to the user. The plurality of audio to be recommended may be the same type of audio or different types of audio. If the types of the plurality of audio to be recommended are the same, the recommended voice can comprise the introduction corresponding to the type.
In this regard, the device determines, from the audio information of each audio to be recommended, whether each audio to be recommended is the same type of audio. The audio information includes a type of audio to be recommended, where the type may refer to a wind of the audio, and the device may determine whether each audio to be recommended is the same type of audio based on the type of the audio information of each audio to be recommended.
Step S302, responding to the audio to be recommended as the same type of audio, and acquiring the theme text corresponding to the type to which the audio to be recommended belongs.
When the audio to be recommended is the same type of audio, the topic text corresponding to the type can be acquired, and the topic text is a text for introducing the type of audio. The subject text is associable with the type and stored in a database, i.e. the device retrieves the subject text from the database based on the type of each audio to be recommended.
Step S303, generating a first recommended text according to the target information and the subject text.
After obtaining the subject text, the apparatus generates a recommended text, which is defined as a first recommended text, based on the target information and the subject text.
Step S304, converting the first recommended text into recommended voice.
In an example, the target information includes at least one of a current emotion, a current time period and a current position of the user, the device finds text matching at least 1 factor of the current emotion, the current time period and the current position from the text library, and modifies the subject text based on the matched text to obtain the first recommended text, for example, chinese characters in the text may be directly added to the subject text to obtain the first recommended text.
In another example, preset information corresponding to the user is set in the device, and the preset information includes at least one of a preset emotion, a preset time period and a preset position. The device needs to determine whether the target information matches the preset information, that is, whether the current emotion is similar to the preset emotion, whether the current time period is within the preset time period, and whether the current position is within the area where the preset position is located. If the current emotion is similar to the preset emotion, the current time period is located in the preset time period, and the current position is located in the area where the preset position is located, the target information can be determined to match with the preset information, and at the moment, the device generates a first recommended text based on the target information and the subject text. For example, if the current emotion and the preset emotion are both negative or positive emotions, then the two are similar, e.g., the current emotion is anger and the preset emotion is sad, and both are negative emotions, then the two are similar. If the current emotion is not similar to the preset emotion, the current time period is located in the preset time period, and/or the current position is located in the area where the preset position is located, the target information is not matched with the preset information, and the device takes the theme text as a first recommended text, namely the theme text is converted into recommended voice.
In addition, when the audio to be recommended is not the same type of audio, the device generates the second recommended text only according to the target information, and converts the second recommended text into recommended speech, that is, the device finds a matched text to generate the recommended speech, and the obtaining of the matched text is referred to the above description, and will not be described herein.
Further, when the second recommended text is generated according to the target information, the device acquires preset information and judges whether the preset information is matched with the target information. The preset information is the preset information. If the preset information is matched with the target information, generating a second recommended text based on the target information; if the preset information is not matched with the target information, the universal voice is determined to be the recommended voice. The generic speech refers to the default recommended speech stored in the device. The determination of the matching of the preset information and the target information refers to the above description, and will not be described in detail herein.
In this embodiment, the device determines whether each audio to be recommended is the same type of audio according to the audio information of each audio to be recommended, if so, obtains the topic text corresponding to each audio type to be recommended, thereby generating the recommended voice through the topic text and the target information, that is, replacing the audio introduction of all the audio to be recommended by the topic text, without generating the recommended voice based on the background of each audio to be recommended, and improving the generation efficiency of the recommended voice.
Referring to fig. 4, fig. 4 is a schematic flow chart schematically illustrating a further embodiment of an audio recommendation method according to an embodiment of the disclosure, and before step S302, further includes:
step S401, obtaining a curved wind keyword corresponding to a target type and description information of the target type, where the target type is a type to which each audio to be recommended belongs.
And step S402, generating a theme text corresponding to the target type according to the song keywords and the description information.
In this embodiment, the device may directly generate the theme text corresponding to each type. The device acquires a curved wind keyword corresponding to a target type and description information of the target type, wherein the target type is a type to which each audio to be recommended belongs. The wind keywords may be entered by operators in the background, the wind keywords being used to indicate the type of audio, e.g., the wind keywords are "rock". The description information may be a text style corresponding to the target type, for example, the description information is: is cheerful.
The device generates a theme text corresponding to the target type based on the curved wind keywords and the description information set. In an example, the wind keyword is "rock", the description information is "cheerful", and the subject text is text with the "cheerful rock" as a subject, that is, the subject text is introduction text of the "cheerful rock".
In another example, a text generation model is set in the device, the device inputs the wind keywords and the description information into the text generation model, the text generation model can generate the subject text based on the wind keywords and the description information, and the device acquires the subject text corresponding to the target type output by the text generation model. The text generation model may be an AIGC (AI Generated Content, artificial intelligence generated content) model.
Further, the device or model generated theme text needs to be modified or not to meet the expectations of operators. In this regard, the device or the text generation model generates a text to be corrected based on the wind keywords and the description information, and then outputs the text to be corrected and the prompt information. The prompting information is used for prompting the text to be corrected, namely, text tuning is performed on the text to be corrected in a manual mode, and the device determines the corrected text to be corrected as the subject text corresponding to the target type.
In this embodiment, the device accurately generates the topic text corresponding to the type through the wind keywords and the description information by using the wind keywords and the description information corresponding to the type to which each audio to be recommended belongs.
Referring to fig. 5, fig. 5 is a schematic flow chart schematically illustrating still another embodiment of an audio recommendation method according to an embodiment of the present disclosure, and before step S301, further includes:
in step S501, audio preference information of the user is acquired, and each audio to be determined of the user preference is determined according to the audio preference information.
In this embodiment, the apparatus needs, as audio to be recommended, audio that may be of interest to the user among the plurality of pieces of audio. Illustratively, the apparatus obtains audio preference information for the user, thereby determining respective audio to be determined for the user preference based on the audio preference information. For example, it may be determined that the user likes rock and country wind based on the audio preference information, and the audio of which the audio type is rock and country wind is taken as the audio to be determined. The audio preference information may be at least one of a user history preference determined based on historical audio play information of the user and a user real-time preference determined based on an audio list currently played by the user.
Further, since the number of the audio is large, if the audio is acquired only based on the audio preference information, the number of the audio to be determined is large. In this regard, the device determines the audio to be determined through a variety of recall modes, which refer to seek modes. For example, the device may obtain a portion of the audio through the act of the user playing the audio and take the trending audio as a portion of the audio. The device extracts the audio of the user preference from the portion of audio based on the audio preference information as the audio to be determined.
Step S502, ranking the user preference of each audio to be determined.
After each audio to be determined is obtained, the device ranks the user favorites of each audio to be determined. The user preference can be determined by clicking the number of times of the audio to be determined and playing the number of times of the audio to be determined by the user, and the more the number of times is, the higher the user preference is. In addition, the number of times the user clicks on the audio similar to the audio to be determined may be converted into the number of times the user clicks on the audio to be determined, for example, the audio to be determined is audio 1, the type of the audio 1 is a cheerful rock, the number of times the user clicks on the audio of the cheerful rock is 100 times, and the 100 times may be converted into the number of times the user clicks on the audio to be determined 70 times. Of course, the number of times the user plays the audio similar to the audio to be determined may also be converted into the number of times the user plays the audio to be determined.
The device sorts the audio to be determined based on the user preference of the audio to be determined. The ranking mode can be ranking from big to small according to user preference, or ranking from small to big according to user preference.
In step S503, each audio to be recommended is determined in the sorted audio to be determined.
The device determines the audio to be determined in the sequenced audio to be determined. In an exemplary embodiment, the apparatus determines each first audio from among the ranked each audio to be determined, the user preference of the first audio is higher than the user preference of the other audio, and the other audio is the audio to be determined that is not determined as the first audio. The device acquires target parameters of each first audio, wherein the target parameters comprise at least one of hit rate, click rate estimated play time length and estimated collection rate of the first audio. Hit rate refers to the probability of the user clicking on the audio, click rate refers to the ratio of the number of times the audio is clicked to the number of times the audio is displayed, estimated playing time length refers to the predicted time length of the audio played by the user, and estimated collection rate refers to the probability of the audio collected by the user. The target parameters can be obtained through a model, the model is obtained through audio playing information training of a user, and the audio playing information comprises information such as the number of times the user clicks audio, the type of the audio, the playing time length of the audio, whether the audio is collected or not and the like. The device inputs the audio information of the first audio and the audio preference information of the user into the model, and then the target parameters of the first audio can be obtained. After the target parameters are obtained, the device determines the audio to be determined according to the first audio corresponding to the target parameters which are larger than a preset threshold.
In an example, the first audio corresponding to the target parameter greater than the threshold may be taken as the audio to be determined.
In another example, the device needs to control the diversity of the audio to be recommended, i.e. to recommend different audio to the user as much as possible, and needs to control the number of audio of the same type, i.e. to keep one or a few audio in the same type of audio. In this regard, the device determines, among the second audios, a third audio having the same attribute, where the second audio is a first audio corresponding to a target parameter greater than a preset parameter, and the attribute includes at least one of a producer, an audio identifier, and a type, that is, a plurality of audios produced by the producer are audios having the same attribute, different versions of audios are audios having the same attribute, and the audio having the same type is an audio having the same attribute. The device deletes at least one audio in each third audio with the same attribute in each category, so as to obtain each audio to be recommended.
In this embodiment, the number of the audio to be determined is large, so that the audio to be determined needs to be ordered, and therefore, a plurality of audio to be recommended are determined in the ordered audio to be determined, so as to reduce the number of the audio recommended to the user, and avoid negative moods such as dislikes generated by recommending too many audio to the user at one time.
Referring to fig. 6, fig. 6 is a schematic flow chart schematically illustrating a further embodiment of an audio recommendation method according to an embodiment of the disclosure, based on the embodiment illustrated in fig. 2, step S202 includes:
step S601, comment information, introduction information and word song information of the audio to be recommended are obtained according to the audio information.
In this embodiment, the apparatus recommends audio to the user in a single manner. The device needs to introduce the audio to be recommended in detail. In this regard, the apparatus acquires criticizing information, introduction information, and word stock information of the audio to be recommended from the audio information. The comment information refers to a comment on the audio to be recommended in the comment area, and the comment may be a heat comment, for example, a comment with a number of praise greater than a preset number is a heat comment. The introduction information includes a production background, an application background, and the like of the audio to be recommended. The lexical song information then comprises the song or lyrics of the audio.
Step S602, generating introduction text of the audio to be recommended according to the comment information, the introduction information and the word stock information.
The device generates an introduction text of the audio to be recommended through the comment information, the introduction information and the word stock information, namely the introduction text comprises the comment, the background, the lyrics, the songs and the like of the audio to be recommended.
In an example, a comment generation model may be set in the apparatus, and the apparatus inputs comment information, introduction information, and word stock information into the comment generation model, so as to obtain introduction text of audio to be recommended output by the comment generation model. The commentary generation model is obtained through training of commentary information, introduction information and word song information of audios in a database. In addition, the comment information is input into the comment generation model to obtain one or more preferable comment texts, and the comment generation model generates the best introduction text through the preferable comment texts, the word song information and the introduction information.
Step S603, the introduction text is modified according to the target information to obtain a third recommended text, and the third recommended text is converted to obtain recommended voice.
After the introduction text is obtained, the device modifies the introduction text based on the target information to obtain a third recommended text, and then the third recommended text is subjected to voice conversion to obtain recommended voice. For example, the text matched with the target information may be obtained, and the obtaining of the matched text refers to the above description and will not be described herein. The device corrects the introduction text based on the matched text, for example, the characters in the matched text are directly introduced into the introduction text by a dialogist, and the correction of the introduction text can be completed.
In this embodiment, the device generates the introduction text based on the comment information, the introduction information and the word song information of the audio to be recommended, so that the user can fully know the audio to be recommended through the introduction text, and the attraction of the audio to be recommended to the user is improved.
Referring to fig. 7, fig. 7 schematically illustrates a flowchart of an embodiment of an audio recommendation method according to an embodiment of the disclosure, based on the embodiment illustrated in any one of fig. 2 to 6, step S203 includes:
step S701, obtaining a currently running play mode.
In this embodiment, the device is provided with two play modes, namely a detailed description mode and a simple description mode. When the recommended voice is generated, the detailed recommended voice and the simple recommended voice are generated based on the two modes, namely the recommended voice comprises the detailed recommended voice and the simple recommended voice. The device acquires the recommended text based on the target information and the audio information, simplifies the recommended text to obtain a simple text, converts the recommended text into voice to obtain detailed recommended voice, and converts the simple text into voice to obtain simple recommended voice.
The user can set the mode of the device, i.e. set whether the device runs a detailed description mode or a simple description mode. Illustratively, a mode switching button is provided on a display interface of the apparatus, and a user can switch a broadcast mode by clicking the mode switching button. When detecting the voice introduction operation of the audio to be recommended, the device acquires the playing mode of the current operation
In step S702, in response to the play mode being the detailed description mode, the detailed recommended speech is played.
In step S703, in response to the play mode being the simple introduction mode, the simple recommended voice is played.
When the playing mode is a detailed description mode, the device plays the detailed recommended voice; if the play mode is the simple introduction mode, the simple recommended voice is played.
In this embodiment, the device acquires a currently running play mode, and if the play mode is detailed description, plays detailed recommended voice; if the playing mode is the simple introduction mode, the simple recommended voice is played, namely the device plays the recommended voice matched with the requirement of the user.
In an embodiment, the user can simply introduce the user when listening to the audio to be recommended, and can introduce the user in detail if the user does not listen to the audio to be recommended, so that the device intelligently recommends the audio to the user. Specifically, the device acquires audio playing information of the user, wherein the audio playing information comprises a history record of playing audio of the user, and whether the user plays the audio to be recommended or not can be determined based on the history record. If the user is determined to play the audio to be recommended, playing simple recommended voice; and if the user is determined not to play the audio to be recommended, playing the detailed recommended voice. The generation of the detailed recommended voices and the simple recommended voices are referred to the above description, and will not be described in detail herein.
Referring to fig. 8, fig. 8 is a schematic flow chart schematically illustrating a further embodiment of an audio recommendation method according to an embodiment of the disclosure, and after step S203, based on the embodiment shown in any one of fig. 2 to fig. 7, the method further includes:
step S801, acquiring sound waves of the recommended voice in play.
In step S802, the image in the display screen is controlled to be converted based on the fluctuation of the acoustic wave.
In this embodiment, when the device plays the recommended voice, an image is displayed on the display screen, and the sound wave of the recommended voice being played is acquired. The device controls the image in the display meeting to be transformed based on the fluctuations of the sound waves.
In an example, the device controls the object in the image to move up and down based on the change of the sound wave, for example, if the sound wave becomes higher, the object displaying the picture moves upwards; if the sound wave becomes low, the object of the display screen moves downward.
In another example, an object in the image may be deformed, and the object may be controlled to deform based on the change in the sound wave, e.g., the sound wave becomes higher, the object is controlled to expand, and the sound wave becomes lower, the object is controlled to contract.
In this embodiment, the device controls the image in the display screen to change based on the change of the sound wave in the recommended voice, that is, the change of the sound wave in the recommended voice is visualized through the change of the image, so that the interestingness of audio playing is improved.
In an embodiment, the apparatus recommends a plurality of audio to be recommended to the user, and the respective audio to be recommended forms a recommendation list, which is defined as a first recommendation list, in order. The ranking of each audio to be recommended may be ranking from large to small with respect to user preference of each audio to be recommended by the user.
Before playing the audio to be recommended in the recommendation list, the device determines whether the audio to be recommended has audio introduction information, and if so, the ordering of the audio to be recommended in the recommendation list can be changed. For example, the audio C is associated with audio introduction information, and the playing order of the audio to be recommended in the first recommendation list is promoted to obtain a second recommendation list, for example, the audio to be recommended is referred to the first position in the first recommendation list. If the audio introduction information is associated with the plurality of audio to be recommended, the playing order of the audio to be recommended needs to be positioned before the audio to be recommended, which is not associated with the audio introduction information. After the second recommendation list is obtained, the device sequentially plays recommended voices of all the audios in the second recommendation list.
In this embodiment, the device preferentially plays the audio to be recommended associated with the audio introduction information, so that the user preferentially hears the audio to be recommended with the audio introduction, and improves the initial impression score of the audio to be recommended for the user, thereby attracting the user to continue listening to the subsequent audio.
In an embodiment, a user may enter a DJ play mode of audio to be recommended in various manners, where the DJ play mode refers to a play mode of recommended voices of the audio to be recommended, and the recommended voices are played in the DJ play mode. The device outputs a playing interface when detecting the playing operation of the audio to be recommended, and plays the recommended voice when detecting the clicking operation of the voice introduction function on the playing interface, wherein the music introduction operation comprises the clicking operation.
In one example, a card is provided on the display interface of the device, which can be regarded as a trigger button for DJ play mode operation. And the user clicks the card to operate the DJ playing mode, namely the device outputs a playing interface, a voice introduction button of the audio to be recommended is arranged on the playing interface, and if the voice introduction button is clicked, the recommended voice is played.
In another example, the audio player of the device is provided with a button switch upstream, and the device enters the DJ play mode by clicking the button switch, and exits the DJ play mode by clicking the button switch again.
After the recommended voice is played, the device automatically enters an audio interface of the audio to be recommended, and a user can perform audio interaction actions such as audio switching and collection on the audio interface.
In this embodiment, the user plays the recommended voice through the operation of the audio to be recommended on the playing interface of the device, so that the information such as the recommendation reason and the creation background of the audio to be recommended can be quickly known through the recommended voice.
Exemplary Medium
Having described the method of the exemplary embodiments of the present disclosure, next, a storage medium of the exemplary embodiments of the present disclosure will be described with reference to fig. 9.
Referring to fig. 9, a storage medium 90 has stored therein a program product for implementing the above-described method according to an embodiment of the present disclosure, which may employ a portable compact disc read only memory (CD-ROM) and include computer-executable instructions for causing a computing device to perform the audio recommendation method provided by the present disclosure. However, the program product of the present disclosure is not limited thereto.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The readable signal medium may include a data signal propagated in baseband or as part of a carrier wave in which the computer-executable instructions are carried. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. The readable signal medium may also be any readable medium other than a readable storage medium.
Computer-executable instructions for performing the operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-executable instructions may be executed entirely on the user computing device, partly on the user device, partly on the remote computing device, or entirely on the remote computing device or server. In the context of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN).
Exemplary apparatus
Having described the medium of the exemplary embodiment of the present disclosure, next, a terminal device of the exemplary embodiment of the present disclosure, which is used to implement the method in any of the above-described audio recommendation method embodiments, will be described with reference to fig. 10, where the implementation principle and technical effects are similar.
Referring to fig. 10, fig. 10 schematically illustrates a structural diagram of an audio recommendation apparatus provided according to an embodiment of the present disclosure.
As shown in fig. 10, the audio recommendation apparatus includes: an obtaining module 1010, configured to obtain audio information of audio to be recommended and target information of a user, where the target information includes at least one of a current emotion, a current time period, and a current position of the user; a generating module 1020 for generating recommended voices according to the audio information and the target information; and a playing module 1030, configured to play the recommended voice in response to a voice introduction operation of the audio to be recommended.
In one embodiment, the generating module 1020 includes: the first determining unit is used for determining whether each audio to be recommended is the same type of audio according to the audio information of each audio to be recommended; the first acquisition unit is used for responding to the audio of the same type of each audio to be recommended and acquiring a theme text corresponding to the type to which each audio to be recommended belongs; the first generation unit is used for generating a first recommended text according to the target information and the subject text; and the first conversion unit is used for converting the first recommended text into recommended voice.
In an embodiment, the first generating unit includes: the first acquisition subunit is used for acquiring preset information corresponding to a user, wherein the preset information comprises at least one of preset emotion, preset time period and preset position; the first generation subunit is used for responding to the target information to match with preset information and generating a first recommended text according to the target information and the subject text.
In an embodiment, the first generating unit further comprises: and the conversion subunit is used for converting the theme text into recommended voice in response to the fact that the target information does not match with the preset information.
In one embodiment, the generating module 1020 further comprises: and the second generation unit is used for responding to the audio that the audio to be recommended is not of the same type, generating a second recommended text according to the target information and converting the second recommended text into recommended voice.
In an embodiment, the second generating unit includes: the second acquisition subunit is used for acquiring preset information corresponding to a user, wherein the preset information comprises at least one of preset emotion, preset time period and preset position; the second generation subunit is used for responding to the target information to match with the preset information and generating a second recommended text according to the target information.
In an embodiment, the second generating unit further comprises: and the first determining subunit is used for determining the universal voice as the recommended voice in response to the fact that the target information does not match the preset information.
In one embodiment, the generating module 1020 further comprises: the second acquisition unit is used for acquiring a curved wind keyword corresponding to a target type and description information of the target type, wherein the target type is a type to which each audio to be recommended belongs; and the second generation unit is used for generating a theme text corresponding to the target type according to the song key words and the description information.
In an embodiment, the second generating unit includes: the first input subunit is used for inputting the wind keywords and the description information into the text generation model to obtain the subject text corresponding to the target type output by the text generation model.
In an embodiment, the second generating unit includes: the second generation subunit is used for generating a text to be corrected according to the curved wind keywords and the description information; the output subunit is used for outputting a text to be corrected and prompt information, wherein the prompt information is used for indicating the text to be corrected; and the second determining subunit is used for determining the corrected text to be corrected as the subject text corresponding to the target type.
In one embodiment, the generating module 1020 further comprises: a third obtaining unit, configured to obtain audio preference information of a user, and determine each audio to be determined of the user preference according to the audio preference information; the ordering unit is used for ordering the user preference of each audio to be determined; and the second determining unit is used for determining each audio to be recommended in the sequenced audio to be determined.
In an embodiment, the second determining unit includes: a third determining subunit, configured to determine, among the sequenced audio to be determined, each first audio, where the user preference of the first audio is higher than the user preference of other audio, and the other audio is an audio to be determined that is not determined to be the first audio; the third acquisition subunit is used for acquiring target parameters of each first audio, wherein the target parameters comprise at least one of hit rate, click rate, estimated playing time length and estimated collection rate of the first audio; and the fourth determination subunit is used for determining the audio to be recommended according to the first audio corresponding to the target parameter larger than the preset threshold.
In an embodiment, the fourth determining subunit comprises: a determining component, configured to determine, from among the second audio, third audio with the same attribute, where the second audio is a first audio corresponding to a target parameter that is greater than a preset parameter, and the attribute includes at least one of a producer, an audio identifier, and a type; and the acquisition component is used for deleting at least one audio from the third audios with the same attribute of each type to obtain each audio to be recommended.
In one embodiment, the generating module includes: the fourth acquisition unit is used for acquiring comment information, introduction information and word song information of the audio to be recommended according to the audio information; the third generation unit is used for generating an introduction text of the audio to be recommended according to the comment information, the introduction information and the word stock information; and the modification unit is used for modifying the introduction text according to the target information to obtain a third recommended text, and converting the third recommended text to obtain recommended voice.
In an embodiment, the third generating unit includes: and the second input subunit is used for inputting the comment information, the introduction information and the word stock information into the comment generation model so as to acquire the introduction text of the audio to be recommended output by the comment generation model.
In one embodiment, the recommended voices include detailed recommended voices and simple recommended voices, and the playing module 1030 includes: a fifth obtaining unit, configured to obtain a currently running play mode; the first playing unit is used for responding to the playing mode as a detailed description mode and playing the detailed recommended voice; and the second playing unit is used for responding to the playing mode being the simple introduction mode and playing the simple recommended voice.
In one embodiment, the play module 1030 includes: a sixth acquisition unit, configured to acquire audio playing information of a user; the third playing unit is used for responding to the fact that the user plays the audio to be recommended according to the audio playing information, and playing the simple recommended voice; and the fourth playing unit is used for playing the detailed recommended voice in response to the fact that the user does not play the audio to be recommended according to the audio playing information.
In one embodiment, the playing module 1030 further includes: a seventh acquisition unit configured to acquire an acoustic wave of the recommended voice in play; and a control unit for controlling the image in the display screen to be transformed based on the fluctuation of the sound wave.
In one embodiment, the play module 1030 includes: the processing unit is used for responding to the audio introduction information related to the audio to be recommended, promoting the playing sequence of the audio to be recommended in the first recommendation list to obtain a second recommendation list, wherein the first recommendation list is a list obtained by sequencing each audio to be recommended; and the fifth playing unit is used for sequentially playing the recommended voices of the audios sequenced in the second recommendation list.
In an embodiment, further comprising: the first output unit is used for responding to the playing operation of the audio to be recommended and outputting a playing interface; and the second output unit is used for responding to the clicking operation of the voice introduction function on the playing interface, and playing the recommended voice, wherein the voice introduction operation comprises the clicking operation.
Exemplary computing device
Having described the methods, media, and apparatus of exemplary embodiments of the present disclosure, a computing device of exemplary embodiments of the present disclosure is next described with reference to fig. 11.
The computing device 110 shown in fig. 11 is only one example and should not be taken as limiting the functionality and scope of use of embodiments of the present disclosure. As shown in fig. 11, computing device 110 is in the form of a general purpose computing device. Components of computing device 110 may include, but are not limited to: at least one processing unit 1101, at least one memory unit 1102, a bus 1103 that connects the different system components, including the processing unit 1101 and the memory unit 1102. Wherein, at least one storage unit 1102 stores computer-executable instructions; at least one processing unit 1101 includes a processor that executes computer-executable instructions to implement the methods described above.
The bus 1103 includes a data bus, a control bus, and an address bus.
The storage unit 1102 may include readable media in the form of volatile memory, such as Random Access Memory (RAM) 11021 and/or cache memory 11022, and may further include readable media in the form of nonvolatile memory, such as Read Only Memory (ROM) 11023.
The storage unit 1102 may also include a program/utility 11025 having a set (at least one) of program modules 11024, such program modules 11024 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Computing device 110 may also communicate with one or more external devices 1104 (e.g., keyboard, pointing device, etc.). Such communication may occur through an input/output (I/O) interface 1105. Moreover, computing device 110 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet via network adapter 1106. As shown in fig. 11, network adapter 1106 communicates with other modules of computing device 110 over bus 1103. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in connection with computing device 110, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of a terminal device/server are mentioned, such a division is only exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present disclosure. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.
Furthermore, although the operations of the methods of the present disclosure are depicted in the drawings in a particular order, this is not required to or suggested that these operations must be performed in this particular order or that all of the illustrated operations must be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.
While the spirit and principles of the present disclosure have been described with reference to several particular embodiments, it is to be understood that this disclosure is not limited to the particular embodiments disclosed nor does it imply that features in these aspects are not to be combined to benefit from this division, which is done for convenience of description only. The disclosure is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (10)

1. An audio recommendation method, comprising:
acquiring audio information of audio to be recommended and target information of a user, wherein the target information comprises at least one of the current emotion, the current time period and the current position of the user;
generating recommended voice according to the audio information and the target information;
And responding to the voice introduction operation of the audio to be recommended, and playing the recommended voice.
2. The audio recommendation method according to claim 1, wherein said generating recommended voices from the audio information and the target information comprises:
determining whether each audio to be recommended is the same type of audio according to the audio information of each audio to be recommended;
responding to the audio to be recommended which is the same type, and acquiring a theme text corresponding to the type to which each audio to be recommended belongs;
generating a first recommended text according to the target information and the subject text;
and converting the first recommended text into recommended voice.
3. The audio recommendation method according to claim 2, wherein the generating a first recommendation text from the target information and the subject text comprises:
acquiring preset information corresponding to the user, wherein the preset information comprises at least one of preset emotion, preset time period and preset position;
and responding to the target information to match the preset information, and generating a first recommended text according to the target information and the subject text.
4. The audio recommendation method according to claim 3, wherein after the obtaining the preset information corresponding to the user, further comprising:
and converting the theme text into recommended voice in response to the target information not matching the preset information.
5. The audio recommendation method according to claim 2, wherein after determining whether each of the audio to be recommended is the same type of audio according to the audio information of each of the audio to be recommended, further comprising:
and generating a second recommended text according to the target information in response to the audio to be recommended is not the same type of audio, and converting the second recommended text into recommended voice.
6. The audio recommendation method of claim 5, wherein generating a second recommendation text based on the target information comprises:
acquiring preset information corresponding to the user, wherein the preset information comprises at least one of preset emotion, preset time period and preset position;
and responding to the target information to match the preset information, and generating a second recommended text according to the target information.
7. The audio recommendation method of claim 1, wherein the generating recommended voices from the audio information and the target information comprises:
Acquiring comment information, introduction information and word song information of the audio to be recommended according to the audio information;
generating an introduction text of the audio to be recommended according to the comment information, the introduction information and the word stock information;
and modifying the introduction text according to the target information to obtain a third recommended text, and converting the third recommended text to obtain recommended voice.
8. An audio recommendation apparatus, comprising:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring audio information of audio to be recommended and target information of a user, and the target information comprises at least one of the current emotion, the current time period and the current position of the user;
the generation module is used for generating recommended voice according to the audio information and the target information;
and the playing module is used for responding to the voice introduction operation of the audio to be recommended and playing the recommended voice.
9. A medium, comprising: computer-executable instructions for implementing the audio recommendation method according to any of claims 1 to 7 when executed by a processor.
10. A computing device, comprising:
A memory and a processor;
the memory stores computer-executable instructions;
the processor executing computer-executable instructions stored in the memory, causing the processor to perform the audio recommendation method of any one of claims 1 to 7.
CN202310377279.4A 2023-04-04 2023-04-04 Audio recommendation method, device, medium and computing equipment Pending CN116405713A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310377279.4A CN116405713A (en) 2023-04-04 2023-04-04 Audio recommendation method, device, medium and computing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310377279.4A CN116405713A (en) 2023-04-04 2023-04-04 Audio recommendation method, device, medium and computing equipment

Publications (1)

Publication Number Publication Date
CN116405713A true CN116405713A (en) 2023-07-07

Family

ID=87011923

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310377279.4A Pending CN116405713A (en) 2023-04-04 2023-04-04 Audio recommendation method, device, medium and computing equipment

Country Status (1)

Country Link
CN (1) CN116405713A (en)

Similar Documents

Publication Publication Date Title
US9824150B2 (en) Systems and methods for providing information discovery and retrieval
CN100545907C (en) Speech recognition dictionary making device and information retrieval device
CN111970257B (en) Manuscript display control method and device, electronic equipment and storage medium
US20070255565A1 (en) Clickable snippets in audio/video search results
US20230018853A1 (en) Creating a cinematic storytelling experience using network-addressable devices
CN110717337A (en) Information processing method, device, computing equipment and storage medium
CN111383669B (en) Multimedia file uploading method, device, equipment and computer readable storage medium
WO2023029984A1 (en) Video generation method and apparatus, terminal, server, and storage medium
CN109710799B (en) Voice interaction method, medium, device and computing equipment
CN107679196A (en) A kind of multimedia recognition methods, electronic equipment and storage medium
GB2532174A (en) Information processing device, control method therefor, and computer program
KR20030059503A (en) User made music service system and method in accordance with degree of preference of user's
CN111753126B (en) Method and device for video dubbing
EP2720155A1 (en) Information processing device, information processing method and program
CN110619673B (en) Method for generating and playing sound chart, method, system and equipment for processing data
CN115129922A (en) Search term generation method, model training method, medium, device and equipment
CN116405713A (en) Audio recommendation method, device, medium and computing equipment
KR100805169B1 (en) How to transfer interactive live music or music files
CN113891142A (en) Song data processing method and device, storage medium and electronic equipment
KR20060100646A (en) Method for searching specific location of image and image search system
CN113703882A (en) Song processing method, device, equipment and computer readable storage medium
US8196046B2 (en) Parallel visual radio station selection
CN115602154B (en) Audio identification method, device, storage medium and computing equipment
CN113868445A (en) Resume position determination method and resume system
US20240184515A1 (en) Vocal Attenuation Mechanism in On-Device App

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination