[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113140230A - Method, device and equipment for determining pitch value of note and storage medium - Google Patents

Method, device and equipment for determining pitch value of note and storage medium Download PDF

Info

Publication number
CN113140230A
CN113140230A CN202110444040.5A CN202110444040A CN113140230A CN 113140230 A CN113140230 A CN 113140230A CN 202110444040 A CN202110444040 A CN 202110444040A CN 113140230 A CN113140230 A CN 113140230A
Authority
CN
China
Prior art keywords
lyric
note pitch
note
pitch information
lyric element
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110444040.5A
Other languages
Chinese (zh)
Other versions
CN113140230B (en
Inventor
劳振锋
陈传艺
孙洪文
关迪聆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Kugou Computer Technology Co Ltd
Original Assignee
Guangzhou Kugou Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Kugou Computer Technology Co Ltd filed Critical Guangzhou Kugou Computer Technology Co Ltd
Priority to CN202110444040.5A priority Critical patent/CN113140230B/en
Publication of CN113140230A publication Critical patent/CN113140230A/en
Application granted granted Critical
Publication of CN113140230B publication Critical patent/CN113140230B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0033Recording/reproducing or transmission of music for electrophonic musical instruments
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals
    • G10L2025/906Pitch tracking

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Reverberation, Karaoke And Other Acoustics (AREA)

Abstract

The application discloses a note pitch value determining method, device, equipment and storage medium, which belong to the technical field of audio processing, and the method comprises the following steps: acquiring pitch information of lyric elements in a first song corresponding to the audio data; determining note pitch information of the corresponding lyric element based on the pitch information of each lyric element, wherein the note pitch information of the lyric element comprises pitch information corresponding to notes of the lyric element; obtaining note pitch information of a stable sounding part of each lyric element to obtain target note pitch information of the corresponding lyric element; based on the target note pitch information for each lyric element, a note pitch value for the corresponding lyric element is determined. According to the embodiment of the application, note pitch information with large deviation caused by unstable singing object breath, unsmooth switching of different lyric elements and the like in note pitch information of the lyric elements is effectively removed, and the calculation accuracy of note pitch values of the lyric elements is improved.

Description

Method, device and equipment for determining pitch value of note and storage medium
Technical Field
The embodiment of the application relates to the technical field of audio processing, in particular to a note pitch value determining method, device, equipment and storage medium.
Background
With the development of singing voice synthesis technology, a variety of applications including acoustic models have been derived. The acoustic model trained by the audio data of a certain object when the object actually sings a song can be used for simulating the audio data of the object when the object sings other songs.
One important feature used in training the acoustic model is the note pitch feature of the word. Note pitch characteristics refer to the pitch characteristics of the notes of a word, which may indicate not only the height of the tone but also the length of the tone, and thus, can more accurately represent the singing characteristics of a word than the pitch characteristics of a word. Note pitch characteristics are usually expressed by note pitch values, and in the related art, extraction of a note pitch value of a word depends on extraction of a pitch value of a word. One scheme for extracting the pitch value of a note is: extracting the pitch information of the words, then obtaining the average value of the pitch information of the words to obtain the pitch value of the words, and converting the pitch value of the words into the note pitch value of the words. Another scheme for extracting the pitch value of a note is: extracting the pitch information of the words, converting the pitch information of the words into the note pitch information of the words, and then averaging the note pitch information of the words to obtain the note pitch value of the words.
The inventor finds that, in the process of implementing the application, conditions such as unstable breath and unsmooth switching of different words are likely to occur in the singing process, so that part of note pitch information in the note pitch information of the words is not the note pitch value that the object actually wants to sing to reach, and further, the calculation of the note pitch value of the words has large deviation.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a storage medium for determining note pitch values, which can be used for improving the calculation accuracy of note pitch values of words. The technical scheme is as follows:
in one aspect, an embodiment of the present application provides a method for determining a pitch value of a note, where the method includes:
acquiring pitch information of lyric elements in a first song corresponding to the audio data;
determining note pitch information for each of the lyric elements based on the pitch information of the lyric element, the note pitch information of the lyric element including pitch information corresponding to notes of the lyric element;
obtaining note pitch information of a stable sounding part of each lyric element to obtain target note pitch information of the corresponding lyric element;
based on the target note pitch information for each of the lyric elements, a note pitch value for the respective lyric element is determined.
In another aspect, an embodiment of the present application provides an apparatus for determining a pitch value of a note, where the apparatus includes:
the first information acquisition module is used for acquiring pitch information of lyric elements in a first song corresponding to the audio data;
a second information determining module, configured to determine, based on pitch information of each of the lyric elements, note pitch information of the corresponding lyric element, where the note pitch information of the lyric element includes pitch information corresponding to a note of the lyric element;
the second information acquisition module is used for acquiring note pitch information of the stable sounding part of each lyric element to obtain target note pitch information of the corresponding lyric element;
a note pitch value determination module for determining a note pitch value for each of the lyric elements based on the target note pitch information for the lyric element.
In yet another aspect, the present application provides a computer device, which includes a processor and a memory, where the memory stores a computer program, and the computer program is loaded and executed by the processor to implement the above-mentioned method for determining a pitch value of a note.
In yet another aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the above-mentioned method for determining a pitch value of a note.
In yet another aspect, embodiments of the present application provide a computer program product, which when run on a computer, causes the computer to execute the above-mentioned method for determining a pitch value of a note.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
the singing characteristics of the words are more accurately represented by extracting the pitch information corresponding to the words in the song from the audio data of the song and converting the pitch information of the words into the note pitch information of the words; and then, intercepting the note pitch information of the words to obtain the note pitch information of the stable sounding parts of the words, and determining the note pitch values of the words based on the note pitch information of the stable sounding parts of the words. According to the embodiment of the application, the note pitch information of the words is effectively removed through intercepting processing of the note pitch information of the words, and due to the fact that the singing object is unstable in breath and different words are switched to be unsmooth, the note pitch information of the words is enabled to be closer to the note pitch value which the singing object actually wants to sing, and therefore the calculation accuracy of the note pitch value of the words is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of a method for determining the pitch value of a note according to one embodiment of the present application;
FIG. 2 is a flow chart of a method for determining the pitch value of a note according to another embodiment of the present application;
FIG. 3 is a block diagram of an apparatus for determining the pitch value of a note according to an embodiment of the present application;
FIG. 4 is a block diagram of an apparatus for determining the pitch value of a note according to another embodiment of the present application;
fig. 5 is a block diagram of a computer device according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
According to the technical scheme provided by the embodiment of the application, the execution main body of each step can be Computer equipment, such as a server with computing capability, or a terminal such as a mobile phone, a tablet Computer, a game console, an e-book reader, a multimedia playing device, a wearable device, a PC (Personal Computer), a smart television, a smart car, and the like, and can also be other Computer equipment. Alternatively, in the case that the computer device is implemented as a server, the computer device may be one server, a server cluster composed of a plurality of servers, or one cloud computing service center.
Before describing the technical scheme of the present application, some terms appearing in the embodiments of the present application are briefly described.
Lyric elements: refers to elements included in the lyrics of a song, such as words, phrases, etc. The types of lyric elements corresponding to different types of songs also vary. Songs include, but are not limited to, the following types: chinese songs, english songs, japanese songs, korean songs, etc. Based on the type of the song, in the embodiment of the present application, the lyric elements include, but are not limited to, the following types: chinese characters, english words, japanese words, korean symbols, and the like. Optionally, a song may be a mixture of multiple song types, and further, a lyric element corresponding to the song may also be a mixture of multiple lyric element types, for example, when the song is a mixture of a chinese song and an english song, the lyric element corresponding to the song is a mixture of chinese words and english words, that is, the lyric element corresponding to the song includes chinese words and english words.
Pitch information of lyric elements: for reflecting the pitch characteristics of the lyric elements. When a song is sung by an object (such as a person, a robot, etc.), the height of the singing sound may change for a specific lyric element. Based on this, in the embodiment of the present application, each lyric element in the lyric of the song corresponds to at least one pitch information, and the at least one pitch information is used for reflecting the pitch characteristic of the corresponding lyric element, namely the height characteristic of the sound of the lyric element.
Note pitch information of lyric elements: for reflecting the note pitch characteristics of the lyric elements. When an object (e.g., a person, a robot, etc.) sings a song, the height of the singing sound may change for a specific lyric element, and the lengths of the singing sound corresponding to different heights may also be different. Based on this, the embodiment of the present application proposes a characteristic of note pitch information of lyric elements, where each lyric element in the lyric of a song corresponds to at least one note pitch information, and the at least one note pitch information is used for reflecting the note pitch characteristics of the corresponding lyric element, that is, the height characteristics of the notes of the lyric element and the length characteristics of the chord. In some implementations, for a specific lyric element, pitch information of a note of the lyric element may correspond to multiple pitch information or one pitch information, which is not limited in the embodiments of the present application.
Note pitch value of lyric element: for reflecting the note pitch characteristics of the lyric elements. The difference between the note pitch information of a lyric element and the note pitch value of the lyric element is: note pitch information of the lyric element is not subjected to the normalization processing of the note pitch value determination method provided in the present application, and note pitch value of the lyric element is a note pitch value obtained after the note pitch information of the lyric element is subjected to the normalization processing of the note pitch value determination method provided in the present application. For a specific lyric element, the note pitch value of the lyric element may be a note pitch information in the note pitch information of the lyric element, or may be note pitch information re-determined based on the note pitch information of the lyric element.
Note pitch mode of lyric elements: the target note pitch information is the target note pitch information with the largest occurrence number in the target note pitch information of the lyric element.
Note pitch maximum of lyric element: refers to the target note pitch information with the largest note pitch value in the target note pitch information of the lyric element.
Stable vocal part of lyric elements: refers to a part which stably vocalizes when an object (such as a person, a robot, etc.) sings a lyric element. Since an object (e.g., a person, a robot, etc.) sings a specific lyric element when singing a song, the initial vocalization of the object when singing the lyric element may be unstable due to the switching between the last lyric element and the lyric element, the unstable airflow of the object singing breath, etc., and in order to obtain the accurate note pitch value of the lyric element, the unstable vocalization part needs to be removed.
Please refer to fig. 1, which shows a flowchart of a method for determining a pitch value of a note according to an embodiment of the present application. The method can be applied to the computer equipment. The method may include several steps as follows.
Step 110, obtaining pitch information of lyric elements in a first song corresponding to the audio data.
The first song is a song that the target object actually sings, and may include one song or a plurality of songs. In an embodiment of the application, the first song includes at least one lyric element, and each lyric element corresponds to at least one pitch information. For an explanation of the lyric element and the pitch information of the lyric element, please refer to the above noun description, which is not repeated herein.
Corresponding audio data can be generated when the target object sings the first song, and the computer device can acquire the audio data. In one example, the computer device performs audio recording and other processing during the process of singing a first song by the target object, so as to obtain audio data corresponding to the first song; in another example, the computer device obtains audio data corresponding to the first song from another computer device, and the another computer device performs audio recording or the like during the process of singing the first song by the target object to obtain the audio data.
After obtaining the audio data corresponding to the first song, the computer device may extract pitch information of a lyric element in the first song from the audio data corresponding to the first song based on the lyric element included in the first song. In one example, the number of lyric elements corresponding to the pitch information extracted by the computer device from the audio data corresponding to the first song may be equal to the number of lyric elements included in the first song, that is, the computer device extracts the pitch information corresponding to each of the lyric elements included in the first song from the audio data corresponding to the first song. In another example, the number of lyric elements corresponding to the pitch information extracted from the audio data corresponding to the first song by the computer device may be less than the number of lyric elements included in the first song, that is, the computer device extracts the pitch information corresponding to some of the lyric elements included in the first song from the audio data corresponding to the first song.
And 120, determining note pitch information of the corresponding lyric elements based on the pitch information of each lyric element, wherein the note pitch information of the lyric elements comprises the pitch information corresponding to notes of the lyric elements.
Therefore, in the embodiment of the present application, after extracting the pitch information of the lyric elements, the computer device further determines the note pitch information of the corresponding lyric elements based on the pitch information of each lyric element, that is, the computer device converts the pitch information of each lyric element into the note pitch information of the corresponding lyric element. Illustratively, assume that the pitch information of the lyric element is f0[ f01, f02,.. f0n]The note pitch information of the lyric element is
Figure BDA0003036200450000061
OrNote pitch information of a lyric element is
Figure BDA0003036200450000062
Wherein A, B, C are all positive numbers, illustratively, A is 69.5, B is 12, and C is 440.
Step 130, obtaining the note pitch information of the stable sounding part of each lyric element to obtain the target note pitch information of the corresponding lyric element.
Since the target object sings the first song, conditions of unstable breath, unsmooth switching of different lyric elements and the like are likely to occur, so that the pitch information of a part of notes in the note pitch information of the lyric elements is not the note pitch value which the target object actually wants to sing to reach. Therefore, in the embodiment of the application, after obtaining the note pitch information of the lyric elements, the computer device intercepts the note pitch information of the stable sounding part of the corresponding lyric element from the note pitch information of each lyric element to obtain the target note pitch information of the corresponding lyric element, so that the deviations caused by unstable breath of the target object, unsmooth switching of different lyric elements and the like can be effectively compensated, and the note pitch information of the lyric elements is closer to the note pitch value actually wanted to be sung by the target object.
In the process that the target object actually sings the first song, the presented singing rule is as follows: for a certain lyric element, the breath is basically stable at the back part of the sounding part of the lyric element, and the extracted pitch information is also stable. Thus, in one example, assuming that the utterance time period of each lyric element is a first time period and the time period of the stable utterance portion of the corresponding lyric element is a second time period, the above step 130 comprises: taking the time later than the starting time of the first time period in the first time period as the starting time of the second time period; taking the time earlier than or equal to the end time of the first time period in the first time period as the end time of the second time period; and intercepting note pitch information of the stable sounding part of the corresponding lyric element from the note pitch information of each lyric element based on the starting time of the second time period and the ending time of the second time period to obtain target note pitch information of the corresponding lyric element. Optionally, the starting time of the second time period is equal to the middle time of the first time period, or the starting time of the second time period is earlier or later than the middle time of the first time period.
Taking the example that the type of the lyric element included in the first song includes a Chinese character, the vocal part of the lyric element includes an initial part and a final part, the final part generally has corresponding pitch information, and the partial rear part of the vocal part of the lyric element is also generally the final part of the lyric element. Based on this, in one example, the lyric elements comprise chinese words, the step 130, described above, comprises: and acquiring note pitch information of the vowel part of each Chinese character to obtain target note pitch information of the corresponding Chinese character. That is, the stable sounding portion of a Chinese word includes the final portion of the Chinese word. Optionally, the obtaining of the note pitch information of the final part of each chinese character to obtain the target note pitch information of the corresponding chinese character includes: and acquiring the note pitch information of a target part in the final part of each Chinese character to obtain the target note pitch information of the corresponding Chinese character, wherein the starting time of the target part is later than the starting time of the final part. Namely, the stable sounding part of the Chinese character comprises the target part in the final part of the Chinese character. Optionally, the end time of the target portion is earlier than or equal to the end time of the final portion. Based on this, when the note pitch information of the lyric element is intercepted, the interception processing may be performed based on the time scale relationship, for example, the note pitch information corresponding to the last N% of the sounding time of the lyric element is intercepted as the target note pitch information, N is a positive number, and for example, N includes any one of the following items: 20. 30, 40, 50, 60, 70, 80.
Based on the target note pitch information for each lyric element, a note pitch value for the corresponding lyric element is determined, step 140.
After the interception process is performed to obtain the target note pitch information of each lyric element, the computer device may determine the note pitch value of the corresponding lyric element based on the target note pitch information of each lyric element. Optionally, the computer device may average the target note pitch information for each lyric element to obtain a note pitch value for the corresponding lyric element; alternatively, the computer device may determine a mode in the target note pitch information for each lyric element as the note pitch value of the corresponding lyric element; alternatively, the computer device may determine a maximum value in the target note pitch information for each lyric element as the note pitch value of the corresponding lyric element. For further description of the process of determining the pitch value of a note of each lyric element, please refer to the following embodiments, which are not repeated herein.
As can be seen from the above description, the note pitch value of the lyric element is an important feature in the singing voice synthesis technology, and can be used for training the acoustic model. Based on this, in an example, after the step 140, the method further includes: reconstructing audio data based on the note pitch values of the lyric elements in the first song to obtain reconstructed data of the first song; acquiring music score data of a first song; constructing a model training sample based on the music score data of the first song and the reconstruction data of the first song; training a simulation singing model through a model training sample; the simulation singing model is used for simulating audio data generated when the target object sings the second song according to the music score data of the second song.
Based on the simulated singing model, it may be realized that the singing voice (audio data) of the second song by the target object is synthesized through the score of the second song. The second song may be a song that has not been sung by the target object, that is, the second song is a song different from the first song; it may also be a song that the target object has performed, i.e. the second song may comprise the same song as the first song. Optionally, in the embodiment of the present application, the music score data of the first song is used as input, and the reconstructed data of the first song is used as output, and parameters of the analog singing model are adjusted, so that a loss function of the analog singing model converges, thereby completing training of the analog singing model.
In summary, according to the technical scheme provided by the embodiment of the application, the pitch information of the lyric elements in the song is extracted from the audio data of the song, and the pitch information of the lyric elements is converted into the note pitch information of the lyric elements, so that the singing characteristics of words are more accurately represented; and then, intercepting the note pitch information of the lyric element to obtain the note pitch information of the stable sounding part of the lyric element, and determining the note pitch value of the lyric element based on the note pitch information of the stable sounding part of the lyric element. According to the embodiment of the application, the note pitch information of the lyric elements is effectively removed through intercepting processing of the note pitch information of the lyric elements, and the note pitch information with large deviation caused by unstable breath of a singing object, unsmooth switching of different lyric elements and the like enables the note pitch information of the lyric elements to be closer to the note pitch value actually required to be sung by the singing object, so that the calculation accuracy of the note pitch value of the lyric elements is improved.
Next, a method for determining a note pitch value of a lyric element according to an embodiment of the present application will be described. In one example, the step 140 includes:
(1) based on the target note pitch information for each lyric element, a note pitch mode for the respective lyric element and a note pitch maximum for the respective lyric element are determined.
The mode of the pitch of the notes of the lyric elements is the pitch information of the target notes with the most repetition times in the pitch information of the target notes of the lyric elements; the maximum value of the note pitch of the lyric element is the target note pitch information with the maximum note pitch value in the target note pitch information of the lyric element.
However, the target note pitch information of the lyric element may have the same and the most number of repetitions of the multiple sets of target note pitch information at the same time, and in one example, in the case where the number of repetitions of the multiple sets of target note pitch information at the same time is the most in the target note pitch information of the lyric element, the target note pitch information having the largest note pitch value is taken as the note pitch mode of the lyric element. For example, in the target note pitch information of the lyric element, the target note pitch information 1 and the target note pitch information 2 are repeated the most times and appear 3 times, and then, in the case that the note pitch value of the target note pitch information 1 is greater than that of the target note pitch information 2, the target note pitch information 1 is taken as the note pitch mode of the lyric element; in the case where the note pitch value of the target note pitch information 2 is larger than that of the target note pitch information 1, the target note pitch information 2 is taken as the note pitch mode of the lyric element.
(2) A note pitch value for the respective lyric element is determined based on a note pitch mode of the respective lyric element and a note pitch maximum of the respective lyric element.
After determining the note pitch mode of the lyric element and the note pitch maximum of the lyric element, the computer device may determine the note pitch mode of the lyric element as the note pitch value of the lyric element or determine the note pitch maximum of the lyric element as the note pitch value of the lyric element based on a magnitude relationship between the note pitch mode of the lyric element and the note pitch maximum of the lyric element, or the like. In the following, several possibilities are described separately.
In one example, the step (2) includes: determining a note pitch mode of the lyric element as a note pitch value of the lyric element, in a case that the note pitch mode of the lyric element is equal to a note pitch maximum of the lyric element; alternatively, where the note pitch mode of the lyric element is equal to the note pitch maximum of the lyric element, the note pitch maximum of the lyric element is determined as the note pitch value of the lyric element.
In another example, the step (2) includes: under the condition that the mode of the note pitch of the lyric elements is not equal to the maximum value of the note pitch of the lyric elements, acquiring an index value corresponding to the maximum value of the note pitch of the lyric elements; determining whether the maximum note pitch of the lyric elements continuously appears or not based on the index value corresponding to the maximum note pitch of the lyric elements; determining the maximum note pitch value of the lyric element as the note pitch value of the lyric element under the condition that the maximum note pitch value of the lyric element continuously appears; in the case that the maximum value of the note pitch of the lyric element does not occur continuously, the mode of the note pitch of the lyric element is determined as the note pitch value of the lyric element.
Since a tremolo may be sung during the singing of some lyric elements during the singing of the target object, and the pitch value of the note tremolo may be higher than the pitch value of the note that the target object actually wants to sing, it is necessary to avoid determining the target note pitch information corresponding to the tremolo as the note pitch value of the lyric element. Also, since trills generally occur at intervals, it may be determined whether the note pitch maxima of the lyric elements occur consecutively based on the index values of the note pitch maxima of the lyric elements, and thus whether the note pitch maxima of the lyric elements are trills. If the maximum note pitch of the lyric elements continuously appears, the maximum note pitch of the lyric elements is not vibrato, and the maximum note pitch of the lyric elements can be determined as the note pitch value of the lyric elements; and if the maximum note pitch of the lyric elements does not continuously appear, the maximum note pitch of the lyric elements is vibrato, and the mode of the note pitch of the lyric elements can be determined as the note pitch value of the lyric elements.
The computer device may obtain an index value corresponding to the maximum note pitch of the lyric element while determining the maximum note pitch of the lyric element, and then determine whether the maximum note pitch of the lyric element occurs continuously based on the index value corresponding to the maximum note pitch of the lyric element. It should be understood that in the case where there is only one note pitch maximum of a lyric element, generally, the note pitch maximum of the lyric element is not considered to be a vibrato.
Optionally, in this embodiment of the application, the index value corresponding to the pitch information of the lyric element is obtained by performing a labeling process on the index value according to the sequence of the occurrence time of the pitch information from morning to evening; the determining whether the maximum note pitch of the lyric element continuously appears based on the index value corresponding to the maximum note pitch of the lyric element includes: determining a maximum index value and a minimum index value based on an index value corresponding to the maximum note pitch of the lyric elements; adding one to the difference value of the maximum index value and the minimum index value to obtain a comparison value; determining that the maximum note pitch of the lyric elements continuously appears under the condition that the number of target note pitch information corresponding to the maximum note pitch of the lyric elements is equal to the comparison value; and under the condition that the number of the target note pitch information corresponding to the maximum value of the note pitch of the lyric element is not equal to the comparison value, determining that the maximum value of the note pitch of the lyric element does not appear continuously.
It should be understood that, in the embodiment of the present application, only taking the target note pitch information of the lyric element as an example, the method for determining the note pitch value of the lyric element provided in the embodiment of the present application is illustrated, and in the case that the target note pitch information of the lyric element is only one, the target note pitch information may be directly used as the note pitch value of the lyric element.
In summary, according to the technical scheme provided by the embodiment of the application, the mode of the note pitch of the lyric element and the maximum value of the note pitch of the word are determined according to the note pitch information of the stable sounding part of the lyric element; and determining the note pitch value of the lyric element based on the note pitch mode and the note pitch maximum value. Compared with the method that the average value of the note pitch information of the stable sounding part is used as the note pitch value of the lyric element, the note pitch value obtained through calculation may not exist in the note pitch information of the stable sounding part, so that large deviation exists between the determined note pitch value and the note pitch value actually required to be sung by the singing object.
In the following, a technical solution provided by the embodiment of the present application is described by taking an example that lyric elements in a first song include chinese characters.
Referring to FIG. 2, a flow chart of a note pitch value determination method provided by an embodiment of the present application is shown. The method can be applied to the computer equipment. The method may include several steps as follows.
Step 210, obtaining pitch information of lyric elements in a first song corresponding to the audio data. The audio data corresponding to the first song refers to audio data generated when the target object sings the first song, and the computer device can extract pitch information of each lyric element in the first song. Based on the pitch information of each lyric element, note pitch information of the corresponding lyric element is determined, step 220. Illustratively, assume that the pitch information of the lyric element is f0[ f01, f02,.. f0n]The note pitch information of the lyric element is
Figure BDA0003036200450000111
Figure BDA0003036200450000112
Step 230, obtaining the note pitch information of the stable sounding part of each lyric element to obtain the target note pitch information of the corresponding lyric element. Optionally, the computer device obtains note pitch information of a target part in a final part of each lyric element to obtain target note pitch information of the corresponding lyric element, wherein the starting time of the target part is later than the starting time of the final part.
Step 240, based on the target note pitch information of each lyric element, determining a note pitch mode of the corresponding lyric element and a note pitch maximum of the corresponding lyric element. The mode of the pitch of the notes of the lyric elements is the pitch information of the target notes with the most repeated times in the pitch information of the target notes of the lyric elements; the maximum value of the note pitch of the lyric element is the target note pitch information with the maximum note pitch value in the target note pitch information of the lyric element.
In the case that the mode of the note pitch of the lyric element is equal to the maximum value of the note pitch of the lyric element, determining the mode of the note pitch of the lyric element or the maximum value of the note pitch of the lyric element as the note pitch value of the lyric element, step 250.
And step 260, under the condition that the mode of the note pitch of the lyric element is not equal to the maximum value of the note pitch of the lyric element, acquiring an index value corresponding to the maximum value of the note pitch of the lyric element. In the embodiment of the application, the index value corresponding to the pitch information of the lyric element is obtained by adding one label according to the sequence of the occurrence time of the pitch information from morning to evening.
Step 270, determining whether the maximum note pitch of the lyric element continuously appears based on the index value corresponding to the maximum note pitch of the lyric element. Optionally, step 270 includes: determining a maximum index value and a minimum index value based on an index value corresponding to the maximum note pitch of the lyric elements; adding one to the difference value of the maximum index value and the minimum index value to obtain a comparison value; determining that the maximum note pitch of the lyric elements continuously appears under the condition that the number of target note pitch information corresponding to the maximum note pitch of the lyric elements is equal to the comparison value; and under the condition that the number of the target note pitch information corresponding to the maximum value of the note pitch of the lyric element is not equal to the comparison value, determining that the maximum value of the note pitch of the lyric element does not appear continuously.
In step 280, in case the maximum note pitch of the lyric element occurs continuously, the maximum note pitch of the lyric element is determined as the note pitch value of the lyric element.
And step 290, determining the mode of the note pitch of the lyric element as the note pitch value of the lyric element under the condition that the maximum value of the note pitch of the lyric element does not continuously appear.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Referring to fig. 3, a block diagram of a device for determining a pitch value of a note according to an embodiment of the present application is shown. The apparatus 300 has functions of implementing the above method embodiments, and the functions may be implemented by hardware, or by hardware executing corresponding software. The apparatus 300 may be the computer device or may be provided in the computer device. The apparatus 300 may include: a first information acquisition module 310, a second information determination module 320, a second information acquisition module 330, and a note pitch value determination module 340.
The first information obtaining module 310 is configured to obtain pitch information of a lyric element in a first song corresponding to the audio data.
A second information determining module 320, configured to determine, based on the pitch information of each of the lyric elements, note pitch information of the corresponding lyric element, where the note pitch information of the lyric element includes pitch information corresponding to a note of the lyric element.
The second information obtaining module 330 is configured to obtain note pitch information of the stable sounding part of each lyric element, so as to obtain target note pitch information of the corresponding lyric element.
A note pitch value determining module 340 for determining a note pitch value of the corresponding lyric element based on the target note pitch information of each lyric element.
In one example, the lyric elements include chinese words; the second information obtaining module 330 is configured to: and acquiring note pitch information of the vowel part of each Chinese character to obtain target note pitch information of the corresponding Chinese character.
In one example, the second information obtaining module 330 is configured to: and acquiring note pitch information of a target part in the final part of each Chinese character to obtain target note pitch information of the corresponding Chinese character, wherein the starting time of the target part is later than that of the final part.
In one example, the vocalization time period of each of the lyric elements is a first time period, and the time period of the stable vocalization part of the corresponding lyric element is a second time period; the second information obtaining module 330 is configured to: taking the time later than the starting time of the first time period in the first time period as the starting time of the second time period; taking the time earlier than or equal to the end time of the first time period in the first time period as the end time of the second time period; and intercepting note pitch information of a stable sounding part of the corresponding lyric element from note pitch information of each lyric element based on the starting time of the second time period and the ending time of the second time period to obtain target note pitch information of the corresponding lyric element.
In one example, as shown in FIG. 4, the note pitch value determination module 340 includes: a reference information determining unit 342 for determining a note pitch mode of the corresponding lyric element and a note pitch maximum of the corresponding lyric element based on the target note pitch information of each lyric element; a note pitch value determination unit 344 for determining a note pitch value for the respective lyric element based on a note pitch mode of the respective lyric element and a note pitch maximum of the respective lyric element; wherein the mode of the note pitch of the lyric element is the target note pitch information with the most repetition times in the target note pitch information of the lyric element; the maximum note pitch of the lyric element is the target note pitch information with the maximum note pitch value in the target note pitch information of the lyric element.
In one example, as shown in fig. 4, the note pitch value determination unit 344 is configured to: determining a note pitch mode of the respective lyric element as a note pitch value of the respective lyric element if the note pitch mode of the respective lyric element is equal to a note pitch maximum of the respective lyric element; or, in a case where a note pitch mode of the respective lyric element is equal to a note pitch maximum of the respective lyric element, determining the note pitch maximum of the respective lyric element as a note pitch value of the respective lyric element.
In one example, as shown in fig. 4, the note pitch value determination unit 344 is configured to: obtaining an index value corresponding to a note pitch maximum value of the respective lyric element if a note pitch mode of the respective lyric element is not equal to the note pitch maximum value of the respective lyric element; determining whether the maximum note pitch of the respective lyric element occurs continuously based on an index value corresponding to the maximum note pitch of the respective lyric element; determining a note pitch maximum of the respective lyric element as a note pitch value of the respective lyric element in case the note pitch maximum of the respective lyric element occurs continuously; determining a mode of note pitches of the respective lyric elements as note pitch values of the respective lyric elements if note pitch maxima of the respective lyric elements occur non-consecutively.
In one example, the index value corresponding to the pitch information of the lyric element is obtained by performing labeling processing on the pitch information in the order from morning to evening; the determining whether the note pitch maxima of the respective lyric elements occur consecutively based on the index value corresponding to the note pitch maxima of the respective lyric elements comprises: determining a maximum index value and a minimum index value based on an index value corresponding to a maximum note pitch of the respective lyric element; adding one to the difference value of the maximum index value and the minimum index value to obtain a comparison value; determining that the maximum note pitch of the corresponding lyric elements continuously appears under the condition that the number of the target note pitch information corresponding to the maximum note pitch of the corresponding lyric elements is equal to the comparison value; and under the condition that the number of the target note pitch information corresponding to the maximum note pitch value of the corresponding lyric element is not equal to the comparison value, determining that the maximum note pitch value of the corresponding lyric element does not appear continuously.
In one example, as shown in fig. 4, the apparatus further comprises: an audio data reconstruction module 350, configured to reconstruct the audio data based on a note pitch value of the lyric element in the first song, to obtain reconstructed data of the first song; a music score data obtaining module 360, configured to obtain music score data of the first song; a training sample construction module 370, configured to construct a model training sample based on the score data of the first song and the reconstruction data of the first song; the singing module training module 380 is used for training a simulation singing model through the model training sample; the simulation singing model is used for simulating audio data generated when the target object sings the second song according to music score data of the second song.
In summary, according to the technical scheme provided by the embodiment of the application, the pitch information of the lyric elements in the song is extracted from the audio data of the song, and the pitch information of the lyric elements is converted into the note pitch information of the lyric elements, so that the singing characteristics of words are more accurately represented; and then, intercepting the note pitch information of the lyric element to obtain the note pitch information of the stable sounding part of the lyric element, and determining the note pitch value of the lyric element based on the note pitch information of the stable sounding part of the lyric element. According to the embodiment of the application, the note pitch information of the lyric elements is effectively removed through intercepting processing of the note pitch information of the lyric elements, and the note pitch information with large deviation caused by unstable breath of a singing object, unsmooth switching of different lyric elements and the like enables the note pitch information of the lyric elements to be closer to the note pitch value actually required to be sung by the singing object, so that the calculation accuracy of the note pitch value of the lyric elements is improved.
It should be noted that, in the device provided in the embodiment of the present application, when the functions of the device are implemented, only the division of the functional modules is illustrated, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the functions described above. In addition, the apparatus and method embodiments provided by the above embodiments belong to the same concept, and specific implementation processes thereof are described in the method embodiments for details, which are not described herein again.
Referring to fig. 5, a block diagram of a computer device according to an embodiment of the present application is shown. The computer device may be used to implement the above-described note pitch value determination method. Specifically, the method comprises the following steps:
the computer device 500 includes a Processing Unit (e.g., a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), an FPGA (Field Programmable Gate Array), etc.) 501, a system Memory 504 including a RAM (Random-Access Memory) 502 and a ROM (Read-Only Memory) 503, and a system bus 505 connecting the system Memory 504 and the Processing Unit 501. The computer device 500 also includes an I/O System (basic Input/Output System) 506 that facilitates transfer of information between devices within the computer device, and a mass storage device 507 for storing an operating System 513, application programs 514, and other program modules 515.
The I/O system 506 includes a display 508 for displaying information and an input device 509 such as a mouse, keyboard, etc. for user input of information. Wherein the display 508 and the input device 509 are connected to the processing unit 501 through an input output controller 510 connected to the system bus 505. The I/O system 506 may also include an input-output controller 510 for receiving and processing input from a number of other devices, such as a keyboard, mouse, or electronic stylus. Similarly, input-output controller 510 also provides output to a display screen, a printer, or other type of output device.
The mass storage device 507 is connected to the processing unit 501 through a mass storage controller (not shown) connected to the system bus 505. The mass storage device 507 and its associated computer-readable media provide non-volatile storage for the computer device 500. That is, the mass storage device 507 may include a computer-readable medium (not shown) such as a hard disk or a CD-ROM (Compact disk Read-Only Memory) drive.
Without loss of generality, the computer readable media may comprise computer storage media and communication media. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), flash Memory or other solid state Memory technology, CD-ROM, DVD (Digital Video Disc) or other optical, magnetic, tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will appreciate that the computer storage media is not limited to the foregoing. The system memory 504 and mass storage device 507 described above may be collectively referred to as memory.
The computer device 500 may also operate as a remote computer connected to a network via a network, such as the internet, in accordance with embodiments of the present application. That is, the computer device 500 may be connected to the network 512 through the network interface unit 511 connected to the system bus 505, or may be connected to other types of networks or remote computer systems (not shown) using the network interface unit 511.
The memory also includes a computer program stored in the memory and configured to be executed by the one or more processors to implement the above-described method of determining a pitch value of a note.
In an exemplary embodiment, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described method of determining a pitch value of a note.
In an exemplary embodiment, there is also provided a computer program product which, when run on a computer device, causes the computer device to perform the above-described method of determining a pitch value of a note.
It should be understood that reference to "a plurality" herein means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
The above description is only exemplary of the present application and should not be taken as limiting the present application, and any modifications, equivalents, improvements and the like that are made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (12)

1. A method of determining the pitch value of a note, the method comprising:
acquiring pitch information of lyric elements in a first song corresponding to the audio data;
determining note pitch information for each of the lyric elements based on the pitch information of the lyric element, the note pitch information of the lyric element including pitch information corresponding to notes of the lyric element;
obtaining note pitch information of a stable sounding part of each lyric element to obtain target note pitch information of the corresponding lyric element;
based on the target note pitch information for each of the lyric elements, a note pitch value for the respective lyric element is determined.
2. The method of claim 1, wherein the lyric elements comprise chinese words;
the obtaining of note pitch information of a stably sounding part of each lyric element to obtain target note pitch information of the corresponding lyric element includes:
and acquiring note pitch information of the vowel part of each Chinese character to obtain target note pitch information of the corresponding Chinese character.
3. The method of claim 2, wherein the obtaining of the note pitch information of the final part of each chinese character to obtain the target note pitch information of the corresponding chinese character comprises:
and acquiring note pitch information of a target part in the final part of each Chinese character to obtain target note pitch information of the corresponding Chinese character, wherein the starting time of the target part is later than that of the final part.
4. The method of claim 1, wherein the period of time for which each of the lyric elements is voiced is a first period of time, and the period of time corresponding to a stable voiced portion of the lyric element is a second period of time;
the obtaining of note pitch information of a stably sounding part of each lyric element to obtain target note pitch information of the corresponding lyric element includes:
taking the time later than the starting time of the first time period in the first time period as the starting time of the second time period;
taking the time earlier than or equal to the end time of the first time period in the first time period as the end time of the second time period;
and intercepting note pitch information of a stable sounding part of the corresponding lyric element from note pitch information of each lyric element based on the starting time of the second time period and the ending time of the second time period to obtain target note pitch information of the corresponding lyric element.
5. The method of claim 1, wherein determining a note pitch value for each of the lyric elements based on the target note pitch information for the lyric element comprises:
determining a mode of note pitch for the respective lyric element and a maximum of note pitch for the respective lyric element based on the target note pitch information for each of the lyric elements;
determining a note pitch value for the respective lyric element based on a note pitch mode of the respective lyric element and a note pitch maximum of the respective lyric element;
wherein the mode of the note pitch of the lyric element is the target note pitch information with the most repetition times in the target note pitch information of the lyric element; the maximum note pitch of the lyric element is the target note pitch information with the maximum note pitch value in the target note pitch information of the lyric element.
6. The method of claim 5, wherein determining the note pitch value for the respective lyric element based on a note pitch mode of the respective lyric element and a note pitch maximum of the respective lyric element comprises:
determining a note pitch mode of the respective lyric element as a note pitch value of the respective lyric element if the note pitch mode of the respective lyric element is equal to a note pitch maximum of the respective lyric element;
or,
determining a note pitch maximum for the respective lyric element as a note pitch value for the respective lyric element if a note pitch mode of the respective lyric element is equal to a note pitch maximum for the respective lyric element.
7. The method of claim 5, wherein determining the note pitch value for the respective lyric element based on a note pitch mode of the respective lyric element and a note pitch maximum of the respective lyric element comprises:
obtaining an index value corresponding to a note pitch maximum value of the respective lyric element if a note pitch mode of the respective lyric element is not equal to the note pitch maximum value of the respective lyric element;
determining whether the maximum note pitch of the respective lyric element occurs continuously based on an index value corresponding to the maximum note pitch of the respective lyric element;
determining a note pitch maximum of the respective lyric element as a note pitch value of the respective lyric element in case the note pitch maximum of the respective lyric element occurs continuously;
determining a mode of note pitches of the respective lyric elements as note pitch values of the respective lyric elements if note pitch maxima of the respective lyric elements occur non-consecutively.
8. The method of claim 7, wherein the index value corresponding to the pitch information of the lyric element is obtained by labeling the occurrence time of the pitch information from morning to evening;
the determining whether the note pitch maxima of the respective lyric elements occur consecutively based on the index value corresponding to the note pitch maxima of the respective lyric elements comprises:
determining a maximum index value and a minimum index value based on an index value corresponding to a maximum note pitch of the respective lyric element;
adding one to the difference value of the maximum index value and the minimum index value to obtain a comparison value;
determining that the maximum note pitch of the corresponding lyric elements continuously appears under the condition that the number of the target note pitch information corresponding to the maximum note pitch of the corresponding lyric elements is equal to the comparison value;
and under the condition that the number of the target note pitch information corresponding to the maximum note pitch value of the corresponding lyric element is not equal to the comparison value, determining that the maximum note pitch value of the corresponding lyric element does not appear continuously.
9. The method of any of claims 1-8, wherein after determining the note pitch value for each of the lyric elements based on the target note pitch information for the lyric element, further comprising:
reconstructing the audio data based on a note pitch value of the lyric element in the first song to obtain reconstructed data of the first song;
acquiring music score data of the first song;
constructing a model training sample based on the music score data of the first song and the reconstruction data of the first song;
training a simulation singing model through the model training sample;
the simulation singing model is used for simulating audio data generated when the target object sings the second song according to music score data of the second song.
10. An apparatus for determining the pitch value of a note, the apparatus comprising:
the first information acquisition module is used for acquiring pitch information of lyric elements in a first song corresponding to the audio data;
a second information determining module, configured to determine, based on pitch information of each of the lyric elements, note pitch information of the corresponding lyric element, where the note pitch information of the lyric element includes pitch information corresponding to a note of the lyric element;
the second information acquisition module is used for acquiring note pitch information of the stable sounding part of each lyric element to obtain target note pitch information of the corresponding lyric element;
a note pitch value determination module for determining a note pitch value for each of the lyric elements based on the target note pitch information for the lyric element.
11. A computer device, characterized in that it comprises a processor and a memory, in which a computer program is stored, which computer program is loaded and executed by the processor to implement the method of determining the pitch value of a note according to any one of claims 1 to 9.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method of determining a pitch value of a note according to any one of claims 1 to 9.
CN202110444040.5A 2021-04-23 2021-04-23 Method, device, equipment and storage medium for determining note pitch value Active CN113140230B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110444040.5A CN113140230B (en) 2021-04-23 2021-04-23 Method, device, equipment and storage medium for determining note pitch value

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110444040.5A CN113140230B (en) 2021-04-23 2021-04-23 Method, device, equipment and storage medium for determining note pitch value

Publications (2)

Publication Number Publication Date
CN113140230A true CN113140230A (en) 2021-07-20
CN113140230B CN113140230B (en) 2023-07-04

Family

ID=76812454

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110444040.5A Active CN113140230B (en) 2021-04-23 2021-04-23 Method, device, equipment and storage medium for determining note pitch value

Country Status (1)

Country Link
CN (1) CN113140230B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111063372A (en) * 2019-12-30 2020-04-24 广州酷狗计算机科技有限公司 Method, device and equipment for determining pitch characteristics and storage medium
CN114078464A (en) * 2022-01-19 2022-02-22 腾讯科技(深圳)有限公司 Audio processing method, device and equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000200090A (en) * 1998-12-29 2000-07-18 Nec Corp Device and method for extracting pitch information, and stored medium storing pitch information extracting program therein
JP2003108177A (en) * 2001-09-27 2003-04-11 Roland Corp Voice synthesizing method and generating method for consonant phoneme piece data
WO2004084175A1 (en) * 2003-03-20 2004-09-30 Sony Corporation Singing voice synthesizing method, singing voice synthesizing device, program, recording medium, and robot
CN1870130A (en) * 2005-05-24 2006-11-29 株式会社东芝 Pitch pattern generation method and its apparatus
CN103559875A (en) * 2013-10-21 2014-02-05 福建星网视易信息系统有限公司 Pitch jitter correction method, device and system, audio and video equipment and mobile terminal
EP2747074A1 (en) * 2012-12-21 2014-06-25 Harman International Industries, Inc. Dynamically adapted pitch correction based on audio input
CN106157979A (en) * 2016-06-24 2016-11-23 广州酷狗计算机科技有限公司 A kind of method and apparatus obtaining voice pitch data
CN108109634A (en) * 2017-12-15 2018-06-01 广州酷狗计算机科技有限公司 Generation method, device and the equipment of song pitch
CN110335629A (en) * 2019-06-28 2019-10-15 腾讯音乐娱乐科技(深圳)有限公司 Pitch recognition methods, device and the storage medium of audio file
CN111063372A (en) * 2019-12-30 2020-04-24 广州酷狗计算机科技有限公司 Method, device and equipment for determining pitch characteristics and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000200090A (en) * 1998-12-29 2000-07-18 Nec Corp Device and method for extracting pitch information, and stored medium storing pitch information extracting program therein
JP2003108177A (en) * 2001-09-27 2003-04-11 Roland Corp Voice synthesizing method and generating method for consonant phoneme piece data
WO2004084175A1 (en) * 2003-03-20 2004-09-30 Sony Corporation Singing voice synthesizing method, singing voice synthesizing device, program, recording medium, and robot
CN1870130A (en) * 2005-05-24 2006-11-29 株式会社东芝 Pitch pattern generation method and its apparatus
EP2747074A1 (en) * 2012-12-21 2014-06-25 Harman International Industries, Inc. Dynamically adapted pitch correction based on audio input
CN103559875A (en) * 2013-10-21 2014-02-05 福建星网视易信息系统有限公司 Pitch jitter correction method, device and system, audio and video equipment and mobile terminal
CN106157979A (en) * 2016-06-24 2016-11-23 广州酷狗计算机科技有限公司 A kind of method and apparatus obtaining voice pitch data
CN108109634A (en) * 2017-12-15 2018-06-01 广州酷狗计算机科技有限公司 Generation method, device and the equipment of song pitch
CN110335629A (en) * 2019-06-28 2019-10-15 腾讯音乐娱乐科技(深圳)有限公司 Pitch recognition methods, device and the storage medium of audio file
CN111063372A (en) * 2019-12-30 2020-04-24 广州酷狗计算机科技有限公司 Method, device and equipment for determining pitch characteristics and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111063372A (en) * 2019-12-30 2020-04-24 广州酷狗计算机科技有限公司 Method, device and equipment for determining pitch characteristics and storage medium
CN114078464A (en) * 2022-01-19 2022-02-22 腾讯科技(深圳)有限公司 Audio processing method, device and equipment
CN114078464B (en) * 2022-01-19 2022-03-22 腾讯科技(深圳)有限公司 Audio processing method, device and equipment

Also Published As

Publication number Publication date
CN113140230B (en) 2023-07-04

Similar Documents

Publication Publication Date Title
CN108831437B (en) Singing voice generation method, singing voice generation device, terminal and storage medium
CN106528678B (en) A kind of song processing method and processing device
CN113012665B (en) Music generation method and training method of music generation model
CN110164460A (en) Sing synthetic method and device
CN111613199B (en) MIDI sequence generating device based on music theory and statistical rule
CN114073854A (en) Game method and system based on multimedia file
CN113140230A (en) Method, device and equipment for determining pitch value of note and storage medium
JP4792703B2 (en) Speech analysis apparatus, speech analysis method, and speech analysis program
CN109841202B (en) Rhythm generation method and device based on voice synthesis and terminal equipment
CN103098124B (en) Method and system for text to speech conversion
CN113010730B (en) Music file generation method, device, equipment and storage medium
CN111445922B (en) Audio matching method, device, computer equipment and storage medium
CN112908308A (en) Audio processing method, device, equipment and medium
JP2022515173A (en) Audio clip matching method and its equipment, computer programs and electronic devices
CN114373480A (en) Training method of voice alignment network, voice alignment method and electronic equipment
CN112989109A (en) Music structure analysis method, electronic equipment and storage medium
JP7372402B2 (en) Speech synthesis method, device, electronic device and storage medium
CN116612788A (en) Emotion recognition method, device, equipment and medium for audio data
CN110728972B (en) Method and device for determining tone similarity and computer storage medium
CN110428668B (en) Data extraction method and device, computer system and readable storage medium
CN115050351A (en) Method and device for generating timestamp and computer equipment
CN114613359A (en) Language model training method, audio recognition method and computer equipment
CN118692425B (en) Training method of music recognition model, music recognition method and related equipment
CN111429949A (en) Pitch line generation method, device, equipment and storage medium
CN116645957B (en) Music generation method, device, terminal, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant