[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN104882152A - Method and apparatus for generating lyric file - Google Patents

Method and apparatus for generating lyric file Download PDF

Info

Publication number
CN104882152A
CN104882152A CN201510257914.0A CN201510257914A CN104882152A CN 104882152 A CN104882152 A CN 104882152A CN 201510257914 A CN201510257914 A CN 201510257914A CN 104882152 A CN104882152 A CN 104882152A
Authority
CN
China
Prior art keywords
mrow
audio file
sequence
target
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510257914.0A
Other languages
Chinese (zh)
Other versions
CN104882152B (en
Inventor
武大伟
赵普
任思豪
龚维
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Kugou Computer Technology Co Ltd
Original Assignee
Guangzhou Kugou Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Kugou Computer Technology Co Ltd filed Critical Guangzhou Kugou Computer Technology Co Ltd
Priority to CN201510257914.0A priority Critical patent/CN104882152B/en
Publication of CN104882152A publication Critical patent/CN104882152A/en
Application granted granted Critical
Publication of CN104882152B publication Critical patent/CN104882152B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and an apparatus for generating a lyric file, and belongs to the technical field of audio processing. The method includes: obtaining a reference audio file corresponding to a to-be-processed target audio file, wherein the reference audio file and the target audio file belong to different versions of the same song; calculating the time deviation between the reference audio file and the target audio file; and correcting a timestamp corresponding to the lyric file of the reference audio file according to the time deviation and regarding the corrected lyric file as the lyric file of the target audio file. According to the method and the apparatus for generating the lyric file, problems of low efficiency and high cost of the generation of the lyric file by the adoption of the manual method are solved, the efficiency for generating the lyric file is improved, and the cost is lowered.

Description

Method and device for generating lyric file
Technical Field
The invention relates to the technical field of audio processing, in particular to a method and a device for generating a lyric file.
Background
With the increasing requirements of users on audio-visual experience, when users use music application programs to perform operations such as viewing, listening, singing and the like on music works, the application programs are required to provide a function of displaying lyrics.
In order to meet the requirements of users, developers of application programs need to generate lyric files matched with different song files. In the related art, a lyric file matched with a song file is manually generated for the song file.
However, generating lyric files manually is not only inefficient but also costly. With the continuous expansion of the scale of the music library, the defects of the manual mode become more and more serious.
Disclosure of Invention
In order to solve the problems of low efficiency and high cost in the manual generation of lyric files in the related art, the embodiment of the invention provides a method and a device for generating lyric files. The technical scheme is as follows:
in a first aspect, a method for generating a lyric file is provided, the method comprising:
acquiring a reference audio file corresponding to a target audio file to be processed, wherein the reference audio file and the target audio file belong to different versions of the same song;
calculating a time offset between the reference audio file and the target audio file;
and correcting the time stamp corresponding to the lyric file of the reference audio file according to the time deviation, and taking the corrected lyric file as the lyric file of the target audio file.
Optionally, the obtaining of the reference audio file corresponding to the target audio file to be processed includes:
acquiring at least one candidate reference audio file corresponding to the target audio file, wherein each candidate reference audio file and the target audio file belong to different versions of the same song;
sorting the at least one candidate reference audio file according to a preset sorting rule;
sequentially selecting the candidate reference audio files one by one according to the sorting result;
detecting whether the selected candidate reference audio file has strong correlation with the target audio file;
and when a candidate reference audio file with strong correlation between the first candidate reference audio file and the target audio file is obtained, stopping selecting the next candidate reference audio file, and taking the candidate reference audio file with strong correlation between the first candidate reference audio file and the target audio file as the reference audio file.
Optionally, the detecting whether there is a strong correlation between the selected candidate reference audio file and the target audio file includes:
calculating a cross-correlation coefficient sequence between the selected candidate reference audio file and the target audio file, wherein the cross-correlation coefficient sequence comprises at least one cross-correlation coefficient;
selecting the maximum value p of the cross-correlation coefficient from the cross-correlation coefficient sequence0
Obtaining the maximum value p0Corresponding position deviation m0
According to the position deviation m0In a first position deviation interval m0+mmin,m0+mmax]And a second position deviation interval [ m0-mmax,m0-mmin]Selecting maximum value p from corresponding correlation coefficients1,1≤mmin<mmax
Detecting the maximum value p0And the maximum value p1Ratio p between0/p1Whether the threshold value is greater than a preset threshold value;
if the ratio p0/p1If the correlation between the selected candidate reference audio file and the target audio file is greater than the preset threshold, determining that the selected candidate reference audio file and the target audio file have strong correlation.
Optionally, the calculating a cross-correlation coefficient sequence between the selected candidate reference audio file and the target audio file includes:
sampling from the selected candidate reference audio file at a preset sampling rate to obtain a candidate audio sampling sequence, and sampling from the target audio file at the preset sampling rate to obtain a target audio sampling sequence;
extracting audio data with preset length from the same position of the candidate audio sampling sequence and the target audio sampling sequence to respectively obtain a candidate audio data sequence and a target audio data sequence;
a cross-correlation coefficient sequence between the candidate audio data sequence and the target audio data sequence is calculated.
Optionally, the calculating a cross-correlation coefficient sequence between the candidate audio data sequence and the target audio data sequence includes:
calculating a cross-correlation coefficient sequence R _ xy (m) between the candidate audio data sequence x (n) and the target audio data sequence y (n) according to the following formula:
<math> <mrow> <mi>R</mi> <mo>_</mo> <mi>xy</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mi>m</mi> <mo>)</mo> </mrow> <mi>y</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
wherein m belongs to- (N-1), (N-1) ], N is more than or equal to 0 and less than or equal to N-1, N + m is more than or equal to 0 and less than or equal to N-1, and N is a positive integer.
Optionally, the calculating a cross-correlation coefficient sequence between the candidate audio data sequence and the target audio data sequence includes:
extracting audio data from the candidate audio data sequence x (n) at intervals of a preset interval to obtain a candidate audio data extraction sequence x '(n), and extracting audio data from the target audio data sequence y (n) at intervals of a preset interval to obtain a target audio data extraction sequence y' (n); wherein x '(n) ═ x (k × n), y' (n) ═ y (k × n), the predetermined interval is k audio data, and k is a positive integer;
calculating a coarse cross-correlation coefficient sequence R _ xy ' (m) between the candidate audio data extraction sequence x ' (n) and the target audio data extraction sequence y ' (n) according to the following formula:
<math> <mrow> <mi>R</mi> <mo>_</mo> <mi>x</mi> <msup> <mi>y</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mrow> <mo>(</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>/</mo> <mi>k</mi> </mrow> </munderover> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mi>m</mi> <mo>)</mo> </mrow> <msup> <mi>y</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
wherein m belongs to- (N-1)/k, (N-1)/k), N is more than or equal to 0 and less than or equal to (N-1)/k, N + m is more than or equal to 0 and less than or equal to (N-1)/k, and N is a positive integer;
obtaining the position deviation m corresponding to the maximum value in the rough cross correlation coefficient sequence R _ xy' (m)1
A positional deviation between the candidate audio data sequence x (n) and the target audio sequence y (n) of k × m1In the state (a), respectively intercepting a candidate audio data interception sequence x "(n) and a target audio data interception sequence y" (n) of a target length from corresponding positions of the candidate audio data sequence x (n) and the target audio sequence y (n);
calculating a sequence of exact cross-correlation coefficients R _ xy "(m) between said candidate truncated sequence of audio data x" (n) and said target truncated sequence of audio data y "(n) according to the following formula:
<math> <mrow> <mi>R</mi> <mo>_</mo> <mi>x</mi> <msup> <mi>y</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>N</mi> <mn>0</mn> </msub> </munderover> <msup> <mi>x</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mi>m</mi> <mo>)</mo> </mrow> <msup> <mi>y</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
wherein m is [ k × m ]1-a,k×m1+a],a≥k,N0Representing the target length, N0Is a preset value; the position deviation m corresponding to the maximum value in the accurate cross-correlation coefficient sequence R _ xy' (m)2I.e. the exact position deviation.
Optionally, the obtaining at least one candidate reference audio file corresponding to the target audio file includes:
obtaining the classification to which the target audio file belongs, wherein the classification is any one of a single song class, a scene class, an accompaniment class and a silencing class;
determining a target classification for searching the candidate reference audio file according to the classification to which the target audio file belongs;
searching audio files meeting preset selection conditions in the target classification as the candidate reference audio files; wherein, the preset selection condition comprises: the audio file has at least one of manually bound lyric files and belongs to high-sound-quality audio files.
Optionally, the determining a target classification for searching the candidate reference audio file according to the classification to which the target audio file belongs includes:
when the classification to which the target audio file belongs to the single song, determining the single song as the target classification; or,
when the classification to which the target audio file belongs to the field class, determining the field class as the target classification; or,
when the classification to which the target audio file belongs to the accompaniment class, determining the accompaniment class, the single-song class and the scene class as the target classification; or,
and when the classification to which the target audio file belongs to the silencing class, determining the silencing class, the single song class and the scene class as the target classification.
In a second aspect, an apparatus for generating a lyric file is provided, the apparatus comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a reference audio file corresponding to a target audio file to be processed, and the reference audio file and the target audio file belong to different versions of the same song;
a calculation module for calculating a time offset between the reference audio file and the target audio file;
and the correction module is used for correcting the time stamp corresponding to the lyric file of the reference audio file according to the time deviation and taking the corrected lyric file as the lyric file of the target audio file.
Optionally, the obtaining module includes: the method comprises the following steps of obtaining a submodule, a sequencing submodule, a selecting submodule, a detecting submodule and a determining submodule;
the obtaining submodule is used for obtaining at least one candidate reference audio file corresponding to the target audio file, and each candidate reference audio file and the target audio file belong to different versions of the same song;
the sorting submodule is used for sorting the at least one candidate reference audio file according to a preset sorting rule;
the selection submodule is used for sequentially selecting the candidate reference audio files one by one according to the sorting result;
the detection submodule is used for detecting whether the selected candidate reference audio file has strong correlation with the target audio file;
the determining sub-module is configured to, when a candidate reference audio file having a strong correlation with the target audio file is obtained, stop selecting a next candidate reference audio file, and use the candidate reference audio file having the strong correlation with the target audio file as the reference audio file.
Optionally, the detection submodule includes: the device comprises a calculation unit, a first selection unit, an acquisition unit, a second selection unit, a detection unit and a determination unit;
the computing unit is used for computing a cross-correlation coefficient sequence between the selected candidate reference audio file and the target audio file, wherein the cross-correlation coefficient sequence comprises at least one cross-correlation coefficient;
the first selecting unit is used for selecting the maximum value p of the cross correlation coefficient from the cross correlation coefficient sequence0
The obtaining unit is used for obtaining the maximum value p0Corresponding position deviation m0
The second selection unit is used for selecting the position deviation m0In a first position deviation interval m0+mmin,m0+mmax]And a second position deviation interval [ m0-mmax,m0-mmin]Selecting maximum value p from corresponding correlation coefficients1,1≤mmin<mmax
The detection unit is used for detecting the maximum value p0And the maximumValue p1Ratio p between0/p1Whether the threshold value is greater than a preset threshold value;
the determination unit is used for determining the ratio p0/p1And when the correlation value is larger than the preset threshold value, determining that the selected candidate reference audio file and the target audio file have strong correlation.
Optionally, the computing unit includes: the device comprises a sampling subunit, an extraction subunit and a calculation subunit;
the sampling subunit is configured to sample the selected candidate reference audio file at a preset sampling rate to obtain a candidate audio sampling sequence, and sample the target audio file at the preset sampling rate to obtain a target audio sampling sequence;
the extraction subunit is configured to extract audio data with a preset length from the same position of the candidate audio sample sequence and the target audio sample sequence, so as to obtain a candidate audio data sequence and a target audio data sequence respectively;
the calculating subunit is configured to calculate a cross-correlation coefficient sequence between the candidate audio data sequence and the target audio data sequence.
Optionally, the calculating subunit is specifically configured to:
calculating a cross-correlation coefficient sequence R _ xy (m) between the candidate audio data sequence x (n) and the target audio data sequence y (n) according to the following formula:
<math> <mrow> <mi>R</mi> <mo>_</mo> <mi>xy</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mi>m</mi> <mo>)</mo> </mrow> <mi>y</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
wherein m belongs to- (N-1), (N-1) ], N is more than or equal to 0 and less than or equal to N-1, N + m is more than or equal to 0 and less than or equal to N-1, and N is a positive integer.
Optionally, the calculating subunit is specifically configured to:
extracting audio data from the candidate audio data sequence x (n) at intervals of a preset interval to obtain a candidate audio data extraction sequence x '(n), and extracting audio data from the target audio data sequence y (n) at intervals of a preset interval to obtain a target audio data extraction sequence y' (n); wherein x '(n) ═ x (k × n), y' (n) ═ y (k × n), the predetermined interval is k audio data, and k is a positive integer;
calculating a coarse cross-correlation coefficient sequence R _ xy ' (m) between the candidate audio data extraction sequence x ' (n) and the target audio data extraction sequence y ' (n) according to the following formula:
<math> <mrow> <mi>R</mi> <mo>_</mo> <mi>x</mi> <msup> <mi>y</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mrow> <mo>(</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>/</mo> <mi>k</mi> </mrow> </munderover> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mi>m</mi> <mo>)</mo> </mrow> <msup> <mi>y</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
wherein m belongs to- (N-1)/k, (N-1)/k), N is more than or equal to 0 and less than or equal to (N-1)/k, N + m is more than or equal to 0 and less than or equal to (N-1)/k, and N is a positive integer;
obtaining the position deviation m corresponding to the maximum value in the rough cross correlation coefficient sequence R _ xy' (m)1
A positional deviation between the candidate audio data sequence x (n) and the target audio sequence y (n) of k × m1In the state (a), respectively intercepting a candidate audio data interception sequence x "(n) and a target audio data interception sequence y" (n) of a target length from corresponding positions of the candidate audio data sequence x (n) and the target audio sequence y (n);
calculating a sequence of exact cross-correlation coefficients R _ xy "(m) between said candidate truncated sequence of audio data x" (n) and said target truncated sequence of audio data y "(n) according to the following formula:
<math> <mrow> <mi>R</mi> <mo>_</mo> <mi>x</mi> <msup> <mi>y</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>N</mi> <mn>0</mn> </msub> </munderover> <msup> <mi>x</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mi>m</mi> <mo>)</mo> </mrow> <msup> <mi>y</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
wherein m is [ k × m ]1-a,k×m1+a],a≥k,N0Representing the target length, N0Is a preset value; the position deviation m corresponding to the maximum value in the accurate cross-correlation coefficient sequence R _ xy' (m)2I.e. the exact position deviation.
Optionally, the obtaining sub-module includes: the device comprises a classification acquisition unit, a classification determination unit and a file search unit;
the classification acquisition unit is used for acquiring the classification to which the target audio file belongs, wherein the classification is any one of a single song class, a scene class, an accompaniment class and a silencing class;
the classification determining unit is used for determining a target classification for searching the candidate reference audio file according to the classification to which the target audio file belongs;
the file searching unit is used for searching audio files meeting preset selection conditions in the target classification as the candidate reference audio files; wherein, the preset selection condition comprises: the audio file has at least one of manually bound lyric files and belongs to high-sound-quality audio files.
Optionally, the classification determining unit includes:
a first classification determining subunit, configured to determine, when the classification to which the target audio file belongs to the single song class, that the single song class is the target classification; and/or the presence of a gas in the gas,
a second classification determination subunit, configured to determine, when the classification to which the target audio file belongs to the live class, that the live class is the target classification; and/or the presence of a gas in the gas,
a third category determination subunit configured to determine, when the category to which the target audio file belongs to the accompaniment category, the single-song category, and the live category as the target category; and/or the presence of a gas in the gas,
a fourth classification determination subunit, configured to determine, when the classification to which the target audio file belongs to the silence class, that the silence class, the single tune class, and the live class are the target classification.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
calculating the time deviation between the reference audio file and the target audio file, and then correcting a time stamp corresponding to the lyric file of the reference audio file according to the time deviation to obtain the lyric file of the target audio file; the problems of low efficiency and high cost of manually generating the lyric file in the related technology are solved; the technical effects of improving the efficiency of generating the lyric file and reducing the cost are achieved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow diagram of a method for generating a lyric file, provided by an embodiment of the present invention;
FIG. 2A is a flow chart of a method for generating a lyric file according to another embodiment of the present invention;
FIG. 2B is a flow chart of step 201 according to another embodiment of the present invention;
FIG. 2C is a flow chart of step 204 in accordance with another embodiment of the present invention;
FIG. 2D is a flowchart of step 204a according to another embodiment of the present invention;
FIG. 3 is a block diagram of an apparatus for generating a lyric file according to an embodiment of the present invention;
fig. 4 is a block diagram of an apparatus for generating a lyric file according to another embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The method for generating the lyric file provided by the embodiment of the invention can be applied to any electronic equipment with computing and processing capabilities. For example, the electronic device may be a server, or may be a terminal such as a mobile phone, a multimedia player, or a computer.
Referring to fig. 1, a flowchart of a method for generating a lyric file according to an embodiment of the present invention is shown, where the method may include the following steps:
step 102, obtaining a reference audio file corresponding to a target audio file to be processed, wherein the reference audio file and the target audio file belong to different versions of the same song.
Step 104, calculating the time offset between the reference audio file and the target audio file.
And step 106, correcting the time stamp corresponding to the lyric file of the reference audio file according to the time deviation, and taking the corrected lyric file as the lyric file of the target audio file.
In summary, in the method provided in this embodiment, the time offset between the reference audio file and the target audio file is calculated, and then the time stamp corresponding to the lyric file of the reference audio file is corrected according to the time offset, so as to obtain the lyric file of the target audio file; the problems of low efficiency and high cost of manually generating the lyric file in the related technology are solved; the technical effects of improving the efficiency of generating the lyric file and reducing the cost are achieved.
Referring to fig. 2A, a flowchart of a method for generating a lyric file according to another embodiment of the present invention is shown, where the method may include the following steps:
step 201, at least one candidate reference audio file corresponding to the target audio file is obtained, and each candidate reference audio file and the target audio file belong to different versions of the same song.
In a song library, there are usually multiple different versions of the same song (or entry), such as a single song version, a live version, an accompaniment version, etc. There may also be multiple different versions of the same type of version of the same song, such as there may be multiple single-song versions that are sung by multiple different singers, multiple live versions that are sung by the same singer at different concerts, and so on. When generating the corresponding lyric file for the target audio file, searching and acquiring other audio files belonging to the same song as the target audio file from the song library as candidate reference audio files.
Alternatively, as shown in fig. 2B, this step may include several sub-steps as follows:
step 201a, obtaining the classification of the target audio file.
The category to which the target audio file belongs includes, but is not limited to, any one of the following categories: single songs, live categories, accompaniment categories, and silence categories.
Step 201b, determining a target classification for searching candidate reference audio files according to the classification to which the target audio file belongs.
In one possible implementation, the class to which the target audio file belongs is directly determined as the target class for finding the candidate reference audio file. That is, after the classification to which the target audio file belongs is obtained, the candidate reference audio file is directly searched under the classification.
In another possible embodiment, step 201b may include the following cases:
1) when the classification to which the target audio file belongs to the single song, determining the single song as a target classification;
2) when the classification to which the target audio file belongs to a field class, determining the field class as a target classification;
3) when the classification to which the target audio file belongs to the accompaniment class, determining the accompaniment class, the single-song class and the scene class as target classifications;
4) and when the classification to which the target audio file belongs to the silencing class, determining the silencing class, the single song class and the live class as target classifications.
Of course, the above two possible embodiments are only exemplary and explanatory, and the present embodiment does not limit other possible embodiments.
Step 201c, searching audio files meeting the preset selection condition in the target classification as candidate reference audio files.
Wherein, the preset selection conditions comprise: the audio file has at least one of manually bound lyric files and belongs to high-tone quality audio files. In the embodiment, the audio file meeting the preset selection condition is selected as the candidate reference audio file, so that the candidate reference audio file can be ensured to have the precisely matched lyric file, the quality of the selected candidate reference audio file can be ensured, and the accuracy of subsequent calculation and correction can be improved.
Step 202, at least one candidate reference audio file is sorted according to a preset sorting rule.
After at least one candidate reference audio file corresponding to the target audio file is obtained, the obtained candidate reference audio files are ranked according to a preset ranking rule. Wherein, the preset ordering rule comprises: at least one of a candidate reference audio file priority belonging to the same category as the target audio file, a candidate reference audio file priority belonging to a high-sound-quality audio file, and a candidate reference audio file priority belonging to a high-heat-degree audio file. In this embodiment, by setting the preset sorting rule, it can be ensured that a candidate reference audio file which is more similar to the target audio file and has higher quality is preferentially selected for subsequent matching calculation, which is beneficial to improving the selection efficiency and saving calculation and processing overhead.
And step 203, sequentially selecting candidate reference audio files one by one according to the sorting result.
Step 204, detecting whether the selected candidate reference audio file has a strong correlation with the target audio file.
In this embodiment, in order to ensure the correction accuracy of the lyric file, when a reference audio file is selected from candidate reference audio files, it is necessary to detect and analyze the correlation between the candidate reference audio file and the target audio file, and select the candidate reference audio file having a strong correlation with the target audio file as a final reference audio file.
Alternatively, as shown in fig. 2C, this step may include several sub-steps as follows:
step 204a, calculating a cross-correlation coefficient sequence between the selected candidate reference audio file and the target audio file, wherein the cross-correlation coefficient sequence comprises at least one cross-correlation coefficient.
As shown in fig. 2D, in one possible implementation, in order to reduce the amount of computation and improve the computation efficiency, step 204a may include the following sub-steps:
step 204a1, sampling the candidate audio sample sequence from the selected candidate reference audio file at a preset sampling rate, and sampling the target audio sample sequence from the target audio file at the preset sampling rate.
In order to facilitate processing of audio files with different code rates and reduce the calculation time overhead, in this embodiment, a down-sampling mode is adopted to down-sample both the selected candidate reference audio file and the target audio file to a preset sampling rate. The predetermined sampling rate may be preset according to actual requirements, for example, the predetermined sampling rate is 8kHz, or the predetermined sampling rate may also be 4kHz, and so on.
Step 204a2, extracting audio data with preset length from the same position of the candidate audio sample sequence and the target audio sample sequence, and respectively obtaining a candidate audio data sequence and a target audio data sequence.
The preset length may be preset according to actual requirements, for example, the preset length is 10 s.
Step 204a3, a cross-correlation coefficient sequence between the candidate audio data sequence and the target audio data sequence is calculated.
Alternatively, the cross-correlation coefficient sequence R _ xy (m) between the candidate audio data sequence x (n) and the target audio data sequence y (n) is calculated according to the following formula:
<math> <mrow> <mi>R</mi> <mo>_</mo> <mi>xy</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mi>m</mi> <mo>)</mo> </mrow> <mi>y</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
wherein m belongs to- (N-1), (N-1) ], N is more than or equal to 0 and less than or equal to N-1, N + m is more than or equal to 0 and less than or equal to N-1, and N is a positive integer.
Step 204b, selecting the maximum value p of the cross-correlation coefficient from the cross-correlation coefficient sequence0
Step 204c, obtaining the maximum value p0Corresponding position deviation m0
Step 204d, according to the position deviation m0In a first position deviation interval m0+mmin,m0+mmax]And a second position deviation interval [ m0-mmax,m0-mmin]Selecting maximum value p from corresponding correlation coefficients1,1≤mmin<mmax
Step 204e, detecting the maximum value p0And a maximum value p1Ratio p between0/p1Whether it is greater than a preset threshold.
Step 204f, if the ratio p0/p1And if the correlation is larger than the preset threshold value, determining that the selected candidate reference audio file has strong correlation with the target audio file.
The points to be explained are: the analysis of the cross-correlation coefficients involved in steps 204b to 204f is crucial in this embodiment, and determines the performance of the whole system, i.e. the accuracy of the calculation and correction of the subsequent time offset. Although a maximum value p must be found from the calculated cross-correlation coefficient sequence R _ xy (m)0But passes through the maximum value p0Corresponding position deviation m0The calculated time offset is not necessarily authentic. The reason is that when p is0When the cross-correlation coefficient is large but not large enough compared with other cross-correlation coefficients in the sequence of cross-correlation coefficients R _ xy (m), it indicates that the correlation between the selected candidate reference audio file and the target audio file is not strong. Therefore, in the present embodiment, by comparing the ratio p0/p1And a predetermined threshold to determine whether the selected candidate reference audio file has a strong correlation with the target audio file, wherein the selected candidate reference audio file and the target audio file are determined to have a strong correlation with each otherAnd under the condition of strong correlation, taking the selected candidate reference audio file as a final reference audio file, otherwise, selecting the next candidate reference audio file until obtaining a candidate reference audio file with strong correlation with the target audio file.
Step 205, when the candidate reference audio file with strong correlation between the first candidate reference audio file and the target audio file is obtained, stopping selecting the next candidate reference audio file, and using the candidate reference audio file with strong correlation between the first candidate reference audio file and the target audio file as the reference audio file.
The reference audio file is an audio file having a strong correlation with the target audio file, and the reference audio file has a manually bound lyrics file, i.e. the reference audio file can be considered to have an exact match lyrics file.
In step 206, the time offset between the reference audio file and the target audio file is calculated.
After the reference audio file is selected, a time offset between the reference audio file and the target audio file is calculated based on a correlation coefficient between the two.
Optionally, the time offset τ ═ m0/k0(ii) a Wherein m is0Representing the maximum value p of the cross-correlation coefficient0Corresponding position deviation, k0Representing a preset sampling rate.
And step 207, correcting the time stamp corresponding to the lyric file of the reference audio file according to the time deviation, and taking the corrected lyric file as the lyric file of the target audio file.
And after the time deviation tau is calculated, correcting the time stamp corresponding to the lyric file of the reference audio file by using the time deviation tau. In this embodiment, the time deviation τ is used to perform an overall correction on the time stamp corresponding to the lyric file, that is, the correction amplitude of the time stamp corresponding to each lyric in the lyric file is the time deviation τ. The corrected lyric file is the lyric file of the target audio file.
In summary, in the method provided in this embodiment, the time offset between the reference audio file and the target audio file is calculated, and then the time stamp corresponding to the lyric file of the reference audio file is corrected according to the time offset, so as to obtain the lyric file of the target audio file; the problems of low efficiency and high cost of manually generating the lyric file in the related technology are solved; the technical effects of improving the efficiency of generating the lyric file and reducing the cost are achieved.
In addition, according to the method provided by the embodiment, the candidate reference audio file with strong correlation with the target audio file is selected as the reference audio file to perform subsequent time deviation calculation and correction, so that the accuracy of the finally generated lyric file is fully improved, and the system performance is ensured.
The points to be supplemented are: in order to further improve the efficiency of the cross-correlation coefficient calculation and save the calculation and processing overhead, the following two ways of calculating the cross-correlation coefficient are provided in the embodiment of the present invention.
In the first mode, the following steps can be included:
1. extracting audio data from the candidate audio data sequence x (n) at intervals of a preset interval to obtain a candidate audio data extraction sequence x '(n), and extracting audio data from the target audio data sequence y (n) at intervals of a preset interval to obtain a target audio data extraction sequence y' (n); where x '(n) ═ x (k × n), y' (n) ═ y (k × n), the predetermined interval is k audio data, and k is a positive integer.
The value of the preset interval k can be set after the factors of two aspects of calculation precision and calculation efficiency are comprehensively considered. For example, k may be set to 4.
2. A coarse cross-correlation coefficient sequence R _ xy ' (m) between the candidate audio data extraction sequence x ' (n) and the target audio data extraction sequence y ' (n) is calculated according to the following formula:
<math> <mrow> <mi>R</mi> <mo>_</mo> <mi>x</mi> <msup> <mi>y</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mrow> <mo>(</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>/</mo> <mi>k</mi> </mrow> </munderover> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mi>m</mi> <mo>)</mo> </mrow> <msup> <mi>y</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
wherein m belongs to- (N-1)/k, (N-1)/k), N is more than or equal to 0 and less than or equal to (N-1)/k, N is more than or equal to 0 and more than or equal to m and less than or equal to (N-1)/k, and N is a positive integer.
3. Obtaining the position deviation m corresponding to the maximum value in the rough cross correlation coefficient sequence R _ xy' (m)1
Wherein the position deviation m1Is a coarse position deviation.
4. The position deviation between the candidate audio data sequence x (n) and the target audio sequence y (n) is k × m1In the state (b), a candidate audio data truncation sequence x "(n) and a target audio data truncation sequence y" (n) of a target length are truncated from corresponding positions of the candidate audio data sequence x (n) and the target audio sequence y (n), respectively.
5. The exact cross-correlation coefficient sequence R _ xy "(m) between the candidate truncated sequence of audio data x" (n) and the target truncated sequence of audio data y "(n) is calculated according to the following formula:
<math> <mrow> <mi>R</mi> <mo>_</mo> <mi>x</mi> <msup> <mi>y</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>N</mi> <mn>0</mn> </msub> </munderover> <msup> <mi>x</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mi>m</mi> <mo>)</mo> </mrow> <msup> <mi>y</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
wherein m is [ k × m ]1-a,k×m1+a],a≥k,N0Representing the target length, N0Is a preset value; the position deviation m corresponding to the maximum value in the sequence of exact cross-correlation coefficients R _ xy' (m)2I.e. the exact position deviation.
In the first method, the "rough positional deviation" is calculated first, and the "precise positional deviation" is calculated. As can be seen from the calculation formula of the cross-correlation coefficient sequence, the calculation complexity of two audio data sequences with the length of N is O (N)2). Therefore, the calculation time overhead can be reduced to 1/k of the original time by the first method2Left and right. Alternatively, when k is 4, the calculation time overhead can be reduced to about 1/16 by using the first method.
In the second approach, FFT (Fast Fourier transform) is used to calculate the cross-correlation coefficients:
the method for calculating the cross-correlation coefficient sequence by using FFT can be derived from the calculation formula of the cross-correlation coefficient sequence:
R_xy=IFFT(conj(FFT(y))×FFT(x));
where conj () denotes the conjugate operation.
The second approach can be implemented by using some existing mature and efficient FFT calculation modules, such as the FFTW (the fast Fourier Transform in the West) library. The FFT is adopted to calculate the cross-correlation coefficient, so that the efficiency of the cross-correlation coefficient calculation can be improved, and the calculation and processing expenses can be saved.
The following are embodiments of the apparatus of the present invention that may be used to perform embodiments of the method of the present invention. For details which are not disclosed in the embodiments of the apparatus of the present invention, reference is made to the embodiments of the method of the present invention.
Referring to fig. 3, a block diagram of an apparatus for generating a lyric file according to an embodiment of the present invention is shown. The device can be applied to any electronic equipment with computing processing capability. The apparatus may include: an acquisition module 310, a calculation module 320, and a correction module 330.
The obtaining module 310 is configured to obtain a reference audio file corresponding to a target audio file to be processed, where the reference audio file and the target audio file belong to different versions of the same song.
A calculating module 320 for calculating a time offset between the reference audio file and the target audio file.
And the correcting module 330 is configured to correct the timestamp corresponding to the lyric file of the reference audio file according to the time offset, and use the corrected lyric file as the lyric file of the target audio file.
In summary, in the apparatus provided in this embodiment, the time offset between the reference audio file and the target audio file is calculated, and then the time stamp corresponding to the lyric file of the reference audio file is corrected according to the time offset, so as to obtain the lyric file of the target audio file; the problems of low efficiency and high cost of manually generating the lyric file in the related technology are solved; the technical effects of improving the efficiency of generating the lyric file and reducing the cost are achieved.
Referring to fig. 4, a block diagram of an apparatus for generating a lyric file according to another embodiment of the present invention is shown. The device can be applied to any electronic equipment with computing processing capability. The apparatus may include: an acquisition module 310, a calculation module 320, and a correction module 330.
The obtaining module 310 is configured to obtain a reference audio file corresponding to a target audio file to be processed, where the reference audio file and the target audio file belong to different versions of the same song.
A calculating module 320 for calculating a time offset between the reference audio file and the target audio file.
And the correcting module 330 is configured to correct the timestamp corresponding to the lyric file of the reference audio file according to the time offset, and use the corrected lyric file as the lyric file of the target audio file.
Optionally, the obtaining module 310 includes: an acquisition sub-module 310a, a sorting sub-module 310b, a selection sub-module 310c, a detection sub-module 310d, and a determination sub-module 310 e.
The obtaining sub-module 310a is configured to obtain at least one candidate reference audio file corresponding to the target audio file, where each candidate reference audio file and the target audio file belong to different versions of the same song.
The sorting sub-module 310b is configured to sort the at least one candidate reference audio file according to a preset sorting rule.
The selecting sub-module 310c is configured to sequentially select the candidate reference audio files one by one according to the sorting result.
The detection sub-module 310d is configured to detect whether there is a strong correlation between the selected candidate reference audio file and the target audio file.
The determining sub-module 310e is configured to, when a candidate reference audio file having a strong correlation with the target audio file is obtained, stop selecting a next candidate reference audio file, and use the candidate reference audio file having the strong correlation with the target audio file as the reference audio file.
Optionally, the detection sub-module 310d includes: a calculating unit 310d1, a first selecting unit 310d2, an obtaining unit 310d3, a second selecting unit 310d4, a detecting unit 310d5 and a determining unit 310d 6.
The calculating unit 310d1 is configured to calculate a cross-correlation coefficient sequence between the selected candidate reference audio file and the target audio file, where the cross-correlation coefficient sequence includes at least one cross-correlation coefficient.
The first selecting unit 310d2 is configured to select a maximum value p of cross-correlation coefficients from the sequence of cross-correlation coefficients0
The obtaining unit 310d3 is configured to obtain the maximum value p0Corresponding position deviation m0
The second selecting unit 310d4 is configured to select the position deviation m according to the position deviation m0In a first position deviation interval m0+mmin,m0+mmax]And a second position deviation interval [ m0-mmax,m0-mmin]Selecting maximum value p from corresponding correlation coefficients1,1≤mmin<mmax
The detection unit 310d5 is used for detecting the maximum value p0And the maximum value p1Ratio p between0/p1Whether it is greater than a preset threshold.
The determining unit 310d6 is configured to determine the ratio p0/p1And when the correlation value is larger than the preset threshold value, determining that the selected candidate reference audio file and the target audio file have strong correlation.
Optionally, the calculating unit 310d1 includes: a sample sub-unit 310d11, an extract sub-unit 310d12, and a compute sub-unit 310d 13.
The sampling sub-unit 310d11 is configured to sample the selected candidate reference audio file at a preset sampling rate to obtain a candidate audio sample sequence, and sample the target audio sample sequence from the target audio file at the preset sampling rate.
The extracting sub-unit 310d12 is configured to extract audio data with a preset length from the same position of the candidate audio sample sequence and the target audio sample sequence, so as to obtain a candidate audio data sequence and a target audio data sequence, respectively.
The calculating subunit 310d13 is configured to calculate a cross-correlation coefficient sequence between the candidate audio data sequence and the target audio data sequence.
Optionally, the computing subunit 310d13 is specifically configured to:
calculating a cross-correlation coefficient sequence R _ xy (m) between the candidate audio data sequence x (n) and the target audio data sequence y (n) according to the following formula:
<math> <mrow> <mi>R</mi> <mo>_</mo> <mi>xy</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mi>m</mi> <mo>)</mo> </mrow> <mi>y</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
wherein m belongs to- (N-1), (N-1) ], N is more than or equal to 0 and less than or equal to N-1, N + m is more than or equal to 0 and less than or equal to N-1, and N is a positive integer.
Optionally, the computing subunit 310d13 is specifically configured to:
extracting audio data from the candidate audio data sequence x (n) at intervals of a preset interval to obtain a candidate audio data extraction sequence x '(n), and extracting audio data from the target audio data sequence y (n) at intervals of a preset interval to obtain a target audio data extraction sequence y' (n); wherein x '(n) ═ x (k × n), y' (n) ═ y (k × n), the predetermined interval is k audio data, and k is a positive integer;
calculating a coarse cross-correlation coefficient sequence R _ xy ' (m) between the candidate audio data extraction sequence x ' (n) and the target audio data extraction sequence y ' (n) according to the following formula:
<math> <mrow> <mi>R</mi> <mo>_</mo> <mi>x</mi> <msup> <mi>y</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mrow> <mo>(</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>/</mo> <mi>k</mi> </mrow> </munderover> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mi>m</mi> <mo>)</mo> </mrow> <msup> <mi>y</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
wherein m belongs to- (N-1)/k, (N-1)/k), N is more than or equal to 0 and less than or equal to (N-1)/k, N + m is more than or equal to 0 and less than or equal to (N-1)/k, and N is a positive integer;
obtaining the position deviation m corresponding to the maximum value in the rough cross correlation coefficient sequence R _ xy' (m)1
A positional deviation between the candidate audio data sequence x (n) and the target audio sequence y (n) of k × m1In the state (a), respectively intercepting a candidate audio data interception sequence x "(n) and a target audio data interception sequence y" (n) of a target length from corresponding positions of the candidate audio data sequence x (n) and the target audio sequence y (n);
calculating a sequence of exact cross-correlation coefficients R _ xy "(m) between said candidate truncated sequence of audio data x" (n) and said target truncated sequence of audio data y "(n) according to the following formula:
<math> <mrow> <mi>R</mi> <mo>_</mo> <mi>x</mi> <msup> <mi>y</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>N</mi> <mn>0</mn> </msub> </munderover> <msup> <mi>x</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mi>m</mi> <mo>)</mo> </mrow> <msup> <mi>y</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
wherein m is [ k × m ]1-a,k×m1+a],a≥k,N0Representing the target length, N0Is a preset value; the position deviation m corresponding to the maximum value in the accurate cross-correlation coefficient sequence R _ xy' (m)2I.e. the exact position deviation.
Optionally, the obtaining sub-module 310a includes: a classification obtaining unit 310a1, a classification determining unit 310a2, and a file searching unit 310a 3.
The classification obtaining unit 310a1 is configured to obtain a classification to which the target audio file belongs, where the classification is any one of a single song category, a live category, an accompaniment category, and a silence category.
The classification determining unit 310a2 is configured to determine a target classification for finding the candidate reference audio file according to the classification to which the target audio file belongs.
The file searching unit 310a3 is configured to search, in the target classification, an audio file meeting a preset selection condition as the candidate reference audio file; wherein, the preset selection condition comprises: the audio file has at least one of manually bound lyric files and belongs to high-sound-quality audio files.
Optionally, the classification determining unit 310a2 includes:
a first classification determining subunit 310a21, configured to determine the single song class as the target classification when the classification to which the target audio file belongs to the single song class; and/or the presence of a gas in the gas,
a second classification determining subunit 310a22, configured to determine that the live class is the target class when the classification to which the target audio file belongs to the live class; and/or the presence of a gas in the gas,
a third classification determining subunit 310a23, configured to determine, when the classification to which the target audio file belongs to the accompaniment class, that the accompaniment class, the single-tune class, and the live class are the target classification; and/or the presence of a gas in the gas,
a fourth classification determining subunit 310a24, configured to determine that the mute class, the single tune class, and the live class are the target classification when the classification to which the target audio file belongs to the mute class.
In summary, in the apparatus provided in this embodiment, the time offset between the reference audio file and the target audio file is calculated, and then the time stamp corresponding to the lyric file of the reference audio file is corrected according to the time offset, so as to obtain the lyric file of the target audio file; the problems of low efficiency and high cost of manually generating the lyric file in the related technology are solved; the technical effects of improving the efficiency of generating the lyric file and reducing the cost are achieved.
In addition, the device provided by this embodiment further selects a candidate reference audio file having a strong correlation with the target audio file as a reference audio file to perform subsequent time offset calculation and correction, thereby sufficiently improving the accuracy of the finally generated lyric file and ensuring the system performance.
In addition, the device provided by this embodiment adopts a coarse-to-fine calculation mode when calculating the cross-correlation coefficient sequence, so as to further improve the efficiency of cross-correlation coefficient calculation and save calculation and processing overhead.
It should be noted that: in the apparatus for generating a lyric file according to the above embodiment, when the lyric file is generated, only the division of the above functional modules is illustrated, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to complete all or part of the above described functions. In addition, the apparatus for generating a lyric file and the method embodiment of the method for generating a lyric file provided by the above embodiments belong to the same concept, and specific implementation processes thereof are detailed in the method embodiment and are not described herein again.
It should be understood that, as used herein, the singular forms "a," "an," "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (16)

1. A method of generating a lyric file, the method comprising:
acquiring a reference audio file corresponding to a target audio file to be processed, wherein the reference audio file and the target audio file belong to different versions of the same song;
calculating a time offset between the reference audio file and the target audio file;
and correcting the time stamp corresponding to the lyric file of the reference audio file according to the time deviation, and taking the corrected lyric file as the lyric file of the target audio file.
2. The method according to claim 1, wherein the obtaining of the reference audio file corresponding to the target audio file to be processed comprises:
acquiring at least one candidate reference audio file corresponding to the target audio file, wherein each candidate reference audio file and the target audio file belong to different versions of the same song;
sorting the at least one candidate reference audio file according to a preset sorting rule;
sequentially selecting the candidate reference audio files one by one according to the sorting result;
detecting whether the selected candidate reference audio file has strong correlation with the target audio file;
and when a candidate reference audio file with strong correlation between the first candidate reference audio file and the target audio file is obtained, stopping selecting the next candidate reference audio file, and taking the candidate reference audio file with strong correlation between the first candidate reference audio file and the target audio file as the reference audio file.
3. The method of claim 2, wherein the detecting whether the selected candidate reference audio file has a strong correlation with the target audio file comprises:
calculating a cross-correlation coefficient sequence between the selected candidate reference audio file and the target audio file, wherein the cross-correlation coefficient sequence comprises at least one cross-correlation coefficient;
selecting the maximum value p of the cross-correlation coefficient from the cross-correlation coefficient sequence0
Obtaining the maximum value p0Corresponding position deviation m0
According to the position deviation m0In a first position deviation interval m0+mmin,m0+mmax]And a second position offsetDifference interval [ m ]0-mmax,m0-mmin]Selecting maximum value p from corresponding correlation coefficients1,1≤mmin<mmax
Detecting the maximum value p0And the maximum value p1Ratio p between0/p1Whether the threshold value is greater than a preset threshold value;
if the ratio p0/p1If the correlation between the selected candidate reference audio file and the target audio file is greater than the preset threshold, determining that the selected candidate reference audio file and the target audio file have strong correlation.
4. The method of claim 3, wherein the calculating the sequence of cross-correlation coefficients between the selected candidate reference audio file and the target audio file comprises:
sampling from the selected candidate reference audio file at a preset sampling rate to obtain a candidate audio sampling sequence, and sampling from the target audio file at the preset sampling rate to obtain a target audio sampling sequence;
extracting audio data with preset length from the same position of the candidate audio sampling sequence and the target audio sampling sequence to respectively obtain a candidate audio data sequence and a target audio data sequence;
a cross-correlation coefficient sequence between the candidate audio data sequence and the target audio data sequence is calculated.
5. The method of claim 4, wherein the calculating the sequence of cross-correlation coefficients between the candidate audio data sequence and the target audio data sequence comprises:
calculating a cross-correlation coefficient sequence R _ xy (m) between the candidate audio data sequence x (n) and the target audio data sequence y (n) according to the following formula:
<math> <mrow> <mi>R</mi> <mo>_</mo> <mi>xy</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mi>m</mi> <mo>)</mo> </mrow> <mi>y</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
wherein m belongs to- (N-1), (N-1) ], N is more than or equal to 0 and less than or equal to N-1, N + m is more than or equal to 0 and less than or equal to N-1, and N is a positive integer.
6. The method of claim 4, wherein the calculating the sequence of cross-correlation coefficients between the candidate audio data sequence and the target audio data sequence comprises:
extracting audio data from the candidate audio data sequence x (n) at intervals of a preset interval to obtain a candidate audio data extraction sequence x '(n), and extracting audio data from the target audio data sequence y (n) at intervals of a preset interval to obtain a target audio data extraction sequence y' (n); wherein x '(n) ═ x (k × n), y' (n) ═ y (k × n), the predetermined interval is k audio data, and k is a positive integer;
calculating a coarse cross-correlation coefficient sequence R _ xy ' (m) between the candidate audio data extraction sequence x ' (n) and the target audio data extraction sequence y ' (n) according to the following formula:
<math> <mrow> <msup> <mrow> <mi>R</mi> <mo>_</mo> <mi>xy</mi> </mrow> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mrow> <mo>(</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>/</mo> <mi>k</mi> </mrow> </munderover> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mi>m</mi> <mo>)</mo> </mrow> <msup> <mi>y</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
wherein m belongs to- (N-1)/k, (N-1)/k), N is more than or equal to 0 and less than or equal to (N-1)/k, N + m is more than or equal to 0 and less than or equal to (N-1)/k, and N is a positive integer;
obtaining the position deviation m corresponding to the maximum value in the rough cross correlation coefficient sequence R _ xy' (m)1
A positional deviation between the candidate audio data sequence x (n) and the target audio sequence y (n) of k × m1In the state (a), respectively intercepting a candidate audio data interception sequence x "(n) and a target audio data interception sequence y" (n) of a target length from corresponding positions of the candidate audio data sequence x (n) and the target audio sequence y (n);
calculating a sequence of exact cross-correlation coefficients R _ xy "(m) between said candidate truncated sequence of audio data x" (n) and said target truncated sequence of audio data y "(n) according to the following formula:
<math> <mrow> <msup> <mrow> <mi>R</mi> <mo>_</mo> <mi>xy</mi> </mrow> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>N</mi> <mn>0</mn> </msub> </munderover> <msup> <mi>x</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mi>m</mi> <mo>)</mo> </mrow> <msup> <mi>y</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
wherein m is [ k × m ]1-a,k×m1+a],a≥k,N0Representing the target length, N0Is a preset value; the position deviation m corresponding to the maximum value in the accurate cross-correlation coefficient sequence R _ xy' (m)2I.e. the exact position deviation.
7. The method according to any one of claims 2 to 6, wherein the obtaining at least one candidate reference audio file corresponding to the target audio file comprises:
obtaining the classification to which the target audio file belongs, wherein the classification is any one of a single song class, a scene class, an accompaniment class and a silencing class;
determining a target classification for searching the candidate reference audio file according to the classification to which the target audio file belongs;
searching audio files meeting preset selection conditions in the target classification as the candidate reference audio files; wherein, the preset selection condition comprises: the audio file has at least one of manually bound lyric files and belongs to high-sound-quality audio files.
8. The method of claim 7, wherein determining a target classification for finding the candidate reference audio file according to the classification to which the target audio file belongs comprises:
when the classification to which the target audio file belongs to the single song, determining the single song as the target classification; or,
when the classification to which the target audio file belongs to the field class, determining the field class as the target classification; or,
when the classification to which the target audio file belongs to the accompaniment class, determining the accompaniment class, the single-song class and the scene class as the target classification; or,
and when the classification to which the target audio file belongs to the silencing class, determining the silencing class, the single song class and the scene class as the target classification.
9. An apparatus for generating a lyric file, the apparatus comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a reference audio file corresponding to a target audio file to be processed, and the reference audio file and the target audio file belong to different versions of the same song;
a calculation module for calculating a time offset between the reference audio file and the target audio file;
and the correction module is used for correcting the time stamp corresponding to the lyric file of the reference audio file according to the time deviation and taking the corrected lyric file as the lyric file of the target audio file.
10. The apparatus of claim 9, wherein the obtaining module comprises: the method comprises the following steps of obtaining a submodule, a sequencing submodule, a selecting submodule, a detecting submodule and a determining submodule;
the obtaining submodule is used for obtaining at least one candidate reference audio file corresponding to the target audio file, and each candidate reference audio file and the target audio file belong to different versions of the same song;
the sorting submodule is used for sorting the at least one candidate reference audio file according to a preset sorting rule;
the selection submodule is used for sequentially selecting the candidate reference audio files one by one according to the sorting result;
the detection submodule is used for detecting whether the selected candidate reference audio file has strong correlation with the target audio file;
the determining sub-module is configured to, when a candidate reference audio file having a strong correlation with the target audio file is obtained, stop selecting a next candidate reference audio file, and use the candidate reference audio file having the strong correlation with the target audio file as the reference audio file.
11. The apparatus of claim 10, wherein the detection submodule comprises: the device comprises a calculation unit, a first selection unit, an acquisition unit, a second selection unit, a detection unit and a determination unit;
the computing unit is used for computing a cross-correlation coefficient sequence between the selected candidate reference audio file and the target audio file, wherein the cross-correlation coefficient sequence comprises at least one cross-correlation coefficient;
the first selecting unit is used for selecting the maximum value p of the cross correlation coefficient from the cross correlation coefficient sequence0
The obtaining unit is used for obtaining the maximum value p0Corresponding position deviation m0
The second selection unit is used for selecting the position deviation m0In a first position deviation interval m0+mmin,m0+mmax]And a second position deviation interval [ m0-mmax,m0-mmin]Selecting maximum value p from corresponding correlation coefficients1,1≤mmin<mmax
The detection unit is used for detecting the maximum value p0And the maximum value p1Ratio p between0/p1Whether the threshold value is greater than a preset threshold value;
the determination unit is used for determining the ratio p0/p1And when the correlation value is larger than the preset threshold value, determining that the selected candidate reference audio file and the target audio file have strong correlation.
12. The apparatus of claim 11, wherein the computing unit comprises: the device comprises a sampling subunit, an extraction subunit and a calculation subunit;
the sampling subunit is configured to sample the selected candidate reference audio file at a preset sampling rate to obtain a candidate audio sampling sequence, and sample the target audio file at the preset sampling rate to obtain a target audio sampling sequence;
the extraction subunit is configured to extract audio data with a preset length from the same position of the candidate audio sample sequence and the target audio sample sequence, so as to obtain a candidate audio data sequence and a target audio data sequence respectively;
the calculating subunit is configured to calculate a cross-correlation coefficient sequence between the candidate audio data sequence and the target audio data sequence.
13. The apparatus according to claim 12, wherein the computing subunit is specifically configured to:
calculating a cross-correlation coefficient sequence R _ xy (m) between the candidate audio data sequence x (n) and the target audio data sequence y (n) according to the following formula:
<math> <mrow> <mi>R</mi> <mo>_</mo> <mi>xy</mi> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>N</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <mi>x</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mi>m</mi> <mo>)</mo> </mrow> <mi>y</mi> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
wherein m belongs to- (N-1), (N-1) ], N is more than or equal to 0 and less than or equal to N-1, N + m is more than or equal to 0 and less than or equal to N-1, and N is a positive integer.
14. The apparatus according to claim 12, wherein the computing subunit is specifically configured to:
extracting audio data from the candidate audio data sequence x (n) at intervals of a preset interval to obtain a candidate audio data extraction sequence x '(n), and extracting audio data from the target audio data sequence y (n) at intervals of a preset interval to obtain a target audio data extraction sequence y' (n); wherein x '(n) ═ x (k × n), y' (n) ═ y (k × n), the predetermined interval is k audio data, and k is a positive integer;
calculating a coarse cross-correlation coefficient sequence R _ xy ' (m) between the candidate audio data extraction sequence x ' (n) and the target audio data extraction sequence y ' (n) according to the following formula:
<math> <mrow> <msup> <mrow> <mi>R</mi> <mo>_</mo> <mi>xy</mi> </mrow> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mrow> <mo>(</mo> <mi>N</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>/</mo> <mi>k</mi> </mrow> </munderover> <msup> <mi>x</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mi>m</mi> <mo>)</mo> </mrow> <msup> <mi>y</mi> <mo>&prime;</mo> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
wherein m belongs to- (N-1)/k, (N-1)/k), N is more than or equal to 0 and less than or equal to (N-1)/k, N + m is more than or equal to 0 and less than or equal to (N-1)/k, and N is a positive integer;
obtaining the position deviation corresponding to the maximum value in the rough cross correlation coefficient sequence R _ xy' (m)m1
A positional deviation between the candidate audio data sequence x (n) and the target audio sequence y (n) of k × m1In the state (a), respectively intercepting a candidate audio data interception sequence x "(n) and a target audio data interception sequence y" (n) of a target length from corresponding positions of the candidate audio data sequence x (n) and the target audio sequence y (n);
calculating a sequence of exact cross-correlation coefficients R _ xy "(m) between said candidate truncated sequence of audio data x" (n) and said target truncated sequence of audio data y "(n) according to the following formula:
<math> <mrow> <msup> <mrow> <mi>R</mi> <mo>_</mo> <mi>xy</mi> </mrow> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>m</mi> <mo>)</mo> </mrow> <mo>=</mo> <munderover> <mi>&Sigma;</mi> <mrow> <mi>n</mi> <mo>=</mo> <mn>0</mn> </mrow> <msub> <mi>N</mi> <mn>0</mn> </msub> </munderover> <msup> <mi>x</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>+</mo> <mi>m</mi> <mo>)</mo> </mrow> <msup> <mi>y</mi> <mrow> <mo>&prime;</mo> <mo>&prime;</mo> </mrow> </msup> <mrow> <mo>(</mo> <mi>n</mi> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
wherein m is [ k × m ]1-a,k×m1+a],a≥k,N0Representing the target length, N0Is a preset value; the position deviation m corresponding to the maximum value in the accurate cross-correlation coefficient sequence R _ xy' (m)2I.e. the exact position deviation.
15. The apparatus of any one of claims 10 to 14, wherein the acquisition sub-module comprises: the device comprises a classification acquisition unit, a classification determination unit and a file search unit;
the classification acquisition unit is used for acquiring the classification to which the target audio file belongs, wherein the classification is any one of a single song class, a scene class, an accompaniment class and a silencing class;
the classification determining unit is used for determining a target classification for searching the candidate reference audio file according to the classification to which the target audio file belongs;
the file searching unit is used for searching audio files meeting preset selection conditions in the target classification as the candidate reference audio files; wherein, the preset selection condition comprises: the audio file has at least one of manually bound lyric files and belongs to high-sound-quality audio files.
16. The apparatus of claim 15, wherein the classification determining unit comprises:
a first classification determining subunit, configured to determine, when the classification to which the target audio file belongs to the single song class, that the single song class is the target classification; and/or the presence of a gas in the gas,
a second classification determination subunit, configured to determine, when the classification to which the target audio file belongs to the live class, that the live class is the target classification; and/or the presence of a gas in the gas,
a third category determination subunit configured to determine, when the category to which the target audio file belongs to the accompaniment category, the single-song category, and the live category as the target category; and/or the presence of a gas in the gas,
a fourth classification determination subunit, configured to determine, when the classification to which the target audio file belongs to the silence class, that the silence class, the single tune class, and the live class are the target classification.
CN201510257914.0A 2015-05-18 2015-05-18 Generate the method and device of lyrics file Active CN104882152B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510257914.0A CN104882152B (en) 2015-05-18 2015-05-18 Generate the method and device of lyrics file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510257914.0A CN104882152B (en) 2015-05-18 2015-05-18 Generate the method and device of lyrics file

Publications (2)

Publication Number Publication Date
CN104882152A true CN104882152A (en) 2015-09-02
CN104882152B CN104882152B (en) 2018-04-10

Family

ID=53949619

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510257914.0A Active CN104882152B (en) 2015-05-18 2015-05-18 Generate the method and device of lyrics file

Country Status (1)

Country Link
CN (1) CN104882152B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105575414A (en) * 2015-12-15 2016-05-11 广州酷狗计算机科技有限公司 Generating method and device of lyric file
CN106649644A (en) * 2016-12-08 2017-05-10 腾讯音乐娱乐(深圳)有限公司 Lyric file generation method and device
CN109145149A (en) * 2018-08-16 2019-01-04 科大讯飞股份有限公司 A kind of information alignment schemes, device, equipment and readable storage medium storing program for executing
CN113920786A (en) * 2021-09-07 2022-01-11 北京小唱科技有限公司 Singing teaching method and singing teaching device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120150849A1 (en) * 2009-06-19 2012-06-14 Thomson Licensing Method for selecting versions of a document from a plurality of versions received after a search, and related receiver
CN104142989A (en) * 2014-07-28 2014-11-12 腾讯科技(深圳)有限公司 Matching detection method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120150849A1 (en) * 2009-06-19 2012-06-14 Thomson Licensing Method for selecting versions of a document from a plurality of versions received after a search, and related receiver
CN104142989A (en) * 2014-07-28 2014-11-12 腾讯科技(深圳)有限公司 Matching detection method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105575414A (en) * 2015-12-15 2016-05-11 广州酷狗计算机科技有限公司 Generating method and device of lyric file
CN105575414B (en) * 2015-12-15 2018-05-11 广州酷狗计算机科技有限公司 The generation method and device of lyrics file
CN106649644A (en) * 2016-12-08 2017-05-10 腾讯音乐娱乐(深圳)有限公司 Lyric file generation method and device
CN106649644B (en) * 2016-12-08 2020-02-07 腾讯音乐娱乐(深圳)有限公司 Lyric file generation method and device
CN109145149A (en) * 2018-08-16 2019-01-04 科大讯飞股份有限公司 A kind of information alignment schemes, device, equipment and readable storage medium storing program for executing
CN113920786A (en) * 2021-09-07 2022-01-11 北京小唱科技有限公司 Singing teaching method and singing teaching device
CN113920786B (en) * 2021-09-07 2024-02-23 北京小唱科技有限公司 Singing teaching method and device

Also Published As

Publication number Publication date
CN104882152B (en) 2018-04-10

Similar Documents

Publication Publication Date Title
EP2685450B1 (en) Device and method for recognizing content using audio signals
CN104978962B (en) Singing search method and system
EP1550297B1 (en) Fingerprint extraction
CN107591149A (en) Audio synthetic method, device and storage medium
US8886635B2 (en) Apparatus and method for recognizing content using audio signal
CN104882152B (en) Generate the method and device of lyrics file
US20120317241A1 (en) Methods and Systems for Performing Comparisons of Received Data and Providing a Follow-On Service Based on the Comparisons
CN104282322B (en) A kind of mobile terminal and its method and apparatus for identifying song climax parts
US20170365276A1 (en) Audio fingerprinting based on audio energy characteristics
CN111326171B (en) Method and system for extracting vocal melody based on numbered musical notation recognition and fundamental frequency extraction
CN105825850B (en) Audio processing method and device
CN104620313A (en) Audio signal analysis
CN109644283B (en) Audio fingerprinting based on audio energy characteristics
CN106782601B (en) multimedia data processing method and device
CN110324726A (en) Model generation, method for processing video frequency, device, electronic equipment and storage medium
CN108628886B (en) Audio file recommendation method and device
CN108711415B (en) Method, apparatus and storage medium for correcting time delay between accompaniment and dry sound
CN105550308B (en) A kind of information processing method, search method and electronic equipment
CN109271501A (en) A kind of management method and system of audio database
CN111986698A (en) Audio segment matching method and device, computer readable medium and electronic equipment
CN111462775B (en) Audio similarity determination method, device, server and medium
CN107133344B (en) Data processing method and device
CN112435688B (en) Audio identification method, server and storage medium
CN109686376B (en) Song singing evaluation method and system
CN101789253B (en) Method for processing digital audio signal

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 510660 Guangzhou City, Guangzhou, Guangdong, Whampoa Avenue, No. 315, self - made 1-17

Applicant after: Guangzhou KuGou Networks Co., Ltd.

Address before: 510000 B1, building, No. 16, rhyme Road, Guangzhou, Guangdong, China 13F

Applicant before: Guangzhou KuGou Networks Co., Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant