CN107045867A - Automatic composing method, device and terminal device - Google Patents
Automatic composing method, device and terminal device Download PDFInfo
- Publication number
- CN107045867A CN107045867A CN201710175115.8A CN201710175115A CN107045867A CN 107045867 A CN107045867 A CN 107045867A CN 201710175115 A CN201710175115 A CN 201710175115A CN 107045867 A CN107045867 A CN 107045867A
- Authority
- CN
- China
- Prior art keywords
- frame
- music
- note
- energy value
- difference
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 239000000203 mixture Substances 0.000 claims abstract description 28
- 239000000284 extract Substances 0.000 claims abstract description 7
- 238000003860 storage Methods 0.000 claims description 23
- 238000012549 training Methods 0.000 claims description 20
- 238000000605 extraction Methods 0.000 claims description 13
- 230000009466 transformation Effects 0.000 claims description 7
- 238000001914 filtration Methods 0.000 claims description 2
- 238000006243 chemical reaction Methods 0.000 claims 1
- 230000006870 function Effects 0.000 description 12
- 238000010586 diagram Methods 0.000 description 9
- 238000012545 processing Methods 0.000 description 9
- 238000005516 engineering process Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 230000002068 genetic effect Effects 0.000 description 4
- 241001269238 Data Species 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000009193 crawling Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000009897 systematic effect Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000007935 neutral effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10H—ELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
- G10H1/00—Details of electrophonic musical instruments
- G10H1/0008—Associated control or indicating means
- G10H1/0025—Automatic or semi-automatic music composition, e.g. producing random music, applying rules from music theory or modifying a musical piece
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Theoretical Computer Science (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
The application proposes a kind of automatic composing method, device and terminal device, and above-mentioned automatic composing method includes:The music file of leading portion music to be predicted is received, the music file of the leading portion music to be predicted includes the voice data or music description information of the leading portion music to be predicted;Extract the frame level audio frequency characteristics of the music file correspondence music;According to the frame level audio frequency characteristics and the music frequency band feature binding model built in advance, the frame level audio frequency characteristics for carrying band information are obtained;According to the frame level audio frequency characteristics of the carrying band information and the music forecast model built in advance, the music predicted is obtained, to realize automatic composition.The application can realize automatic composition, and then can improve the efficiency and feasibility wrirted music automatically, reduce influence of the subjective factor to wrirting music automatically.
Description
Technical field
The application is related to Audio Signal Processing technical field, more particularly to a kind of automatic composing method, device and terminal are set
It is standby.
Background technology
With application of the computer technology in music processing, Computer Music is arisen at the historic moment.Computer Music is as new
Raw generation art, gradually penetrates into the various aspects such as creation, instrument playing, education, the amusement of music.Using artificial intelligence technology
Progress is wrirted music as research direction newer in Computer Music automatically, and the height of association area researcher is received in recent years
Pay attention to.
The existing automatic composing method based on artificial intelligence technology mainly has following two:Based on heuristic search from
Action song and the automatic composition based on genetic algorithm.But, the existing automatic composition based on heuristic search is only applicable to pleasure
Situation short Qu Changdu, its search efficiency is added to exponential decline with length of audio track, thus for the longer pleasure of length
The poor feasibility of bent this method;And the automatic composing method based on genetic algorithm inherits some exemplary shortcomings of genetic algorithm,
For example:, genetic operator big to initial population dependence is difficult to precisely selected etc..
The content of the invention
The purpose of the application is intended at least solve one of technical problem in correlation technique to a certain extent.
Therefore, first purpose of the application is to propose a kind of automatic composing method.This method is by building music frequency
Band feature binding model and music forecast model, realize automatic composition, are a kind of brand-new automatic composing methods, solve existing
Efficiency present in technology is low, poor feasibility, big subjective impact the problems such as.
Second purpose of the application is to propose a kind of automatic composition device.
The 3rd purpose of the application is to propose a kind of terminal device.
The 4th purpose of the application is to propose a kind of storage medium for including computer executable instructions.
To achieve these goals, the automatic composing method of the application first aspect embodiment, including:Before reception is to be predicted
Duan Yinle music file, the music file of the leading portion music to be predicted includes the voice data of the leading portion music to be predicted
Or music description information;Extract the frame level audio frequency characteristics of the music file correspondence music;According to the frame level audio frequency characteristics and
The music frequency band feature binding model built in advance, obtains the frame level audio frequency characteristics for carrying band information;Frequency is carried according to described
The frame level audio frequency characteristics of information and the music forecast model built in advance, obtain the music predicted, to realize automatic composition.
In the automatic composing method of the embodiment of the present application, after the music file for receiving leading portion music to be predicted, in extraction
The frame level audio frequency characteristics of music file correspondence music are stated, then according to above-mentioned frame level audio frequency characteristics and the music frequency band built in advance
Feature binding model, obtains the frame level audio frequency characteristics for carrying band information, finally according to the frame level sound of above-mentioned carrying band information
Frequency feature and the music forecast model built in advance, obtain the music that predicts, so as to realize automatic composition, and then can be with
The efficiency and feasibility wrirted music automatically are improved, influence of the subjective factor to wrirting music automatically is reduced.
To achieve these goals, the automatic composition device of the application second aspect embodiment, including:Receiving module, is used
In the music file for receiving leading portion music to be predicted, the music file of the leading portion music to be predicted includes the leading portion to be predicted
The voice data or music description information of music;Extraction module, for extracting the music file correspondence that the receiving module is received
The frame level audio frequency characteristics of music;Module is obtained, for according to the frame level audio frequency characteristics and the music frequency band feature built in advance
Binding model, obtains the frame level audio frequency characteristics for carrying band information;And it is special according to the frame level audio of the carrying band information
Seek peace the music forecast model built in advance, the music predicted is obtained, to realize automatic composition.
In the automatic composition device of the embodiment of the present application, receiving module receive leading portion music to be predicted music file it
Afterwards, extraction module extracts the frame level audio frequency characteristics of above-mentioned music file correspondence music, then obtains module according to above-mentioned frame level sound
Frequency feature and the music frequency band feature binding model built in advance, obtain the frame level audio frequency characteristics for carrying band information, Yi Jigen
According to the frame level audio frequency characteristics and the music forecast model that builds in advance of above-mentioned carrying band information, the music predicted is obtained, from
And automatic composition can be realized, and then the efficiency and feasibility wrirted music automatically can be improved, subjective factor is reduced to acting certainly
Bent influence.
To achieve these goals, the terminal device of the application third aspect embodiment, including:One or more processing
Device;Storage device, for storing one or more programs;When one or more of programs are by one or more of processors
During execution so that one or more of processors realize method as described above.
To achieve these goals, the application fourth aspect embodiment provides a kind of depositing comprising computer executable instructions
Storage media, the computer executable instructions are used to perform method as described above when being performed by computer processor.
The aspect and advantage that the application is added will be set forth in part in the description, and will partly become from the following description
Obtain substantially, or recognized by the practice of the application.
Brief description of the drawings
The above-mentioned and/or additional aspect of the application and advantage will become from the following description of the accompanying drawings of embodiments
Substantially and be readily appreciated that, wherein:
Fig. 1 is the flow chart of the automatic composing method one embodiment of the application;
Fig. 2 is the flow chart of automatic another embodiment of composing method of the application;
Fig. 3 is the schematic diagram of topological structure one embodiment in the automatic composing method of the application;
Fig. 4 is the flow chart of the automatic composing method further embodiment of the application;
Fig. 5 is energy value coordinate representation schematic diagram in the automatic composing method of the application;
Fig. 6 is the flow chart of the automatic composing method further embodiment of the application;
Fig. 7 is the flow chart of the automatic composing method further embodiment of the application;
Fig. 8 is the schematic diagram of topological structure another embodiment in the automatic composing method of the application;
Fig. 9 is that the application wrirtes music the structural representation of device one embodiment automatically;
Figure 10 is that the application wrirtes music the structural representation of another embodiment of device automatically;
Figure 11 is the structural representation of the application terminal device one embodiment.
Embodiment
Embodiments herein is described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end
Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached
The embodiment of figure description is exemplary, is only used for explaining the application, and it is not intended that limitation to the application.On the contrary, this
All changes in the range of spirit and intension that the embodiment of application includes falling into attached claims, modification and equivalent
Thing.
Fig. 1 is the flow chart of the automatic composing method one embodiment of the application, as shown in figure 1, above-mentioned automatic composing method
It can include:
Step 101, the music file of leading portion music to be predicted is received, the music file of above-mentioned leading portion music to be predicted includes
The voice data or music description information of above-mentioned leading portion music to be predicted.
Wherein, the voice data or music description information of above-mentioned leading portion music to be predicted refer to giving a bit of music
Voice data or music description information, then can just be described according to the voice data or music of given a bit of music
Music behind information prediction.
Above-mentioned music description information can typically be converted to voice data, and above-mentioned music description information can be Instrument Digital
Interface (Musical Instrument Digital Interface;Hereinafter referred to as:MIDI) file etc..
Step 102, the frame level audio frequency characteristics of above-mentioned music file correspondence music are extracted.
Step 103, according to above-mentioned frame level audio frequency characteristics and the music frequency band feature binding model built in advance, carried
The frame level audio frequency characteristics of band information.
Step 104, according to the frame level audio frequency characteristics of above-mentioned carrying band information and the music forecast model built in advance, obtain
The music that must be predicted, to realize automatic composition.
In above-mentioned automatic composing method, after the music file for receiving leading portion music to be predicted, above-mentioned music file is extracted
The frame level audio frequency characteristics of correspondence music, then according to above-mentioned frame level audio frequency characteristics and the music frequency band feature combination mould built in advance
Type, obtains the frame level audio frequency characteristics for carrying band information, frame level audio frequency characteristics finally according to above-mentioned carrying band information and pre-
The music forecast model first built, obtains the music predicted, so as to realize automatic composition, and then can improve from action
Bent efficiency and feasibility, reduces influence of the subjective factor to wrirting music automatically.
Fig. 2 is the flow chart of automatic another embodiment of composing method of the application, as shown in Fig. 2 before step 103, also
It can include:
Step 201, music file is collected, and above-mentioned music file is converted to the audio file of same format.
Specifically, a large amount of training datas can be obtained by crawling a large amount of music files in internet, above-mentioned music file can be with
It is voice data or music description information, for example:MIDI files etc..Then above-mentioned music file can be converted to
The audio file of same format, the form of above-mentioned audio file, which need to only be met, can carry out Fast Fourier Transform (FFT) (Fast
Fourier Transformation;Hereinafter referred to as:FFT), for example:" .PCM " or " .WAV " etc., the present embodiment is to above-mentioned
The form of audio file is not construed as limiting, and the present embodiment is illustrated by taking " .PCM " form as an example.It should be noted that:If above-mentioned
Music file is music description information, and such as MIDI files then need that MIDI files first are converted into audio file, are reconverted into
The audio file of " .PCM " form.
Step 202, the frame level audio frequency characteristics of above-mentioned audio file are extracted.
Step 203, the topological structure of music frequency band feature binding model is determined.
Specifically, topological structure is a neural network structure liquidated, and the present embodiment is with the Recognition with Recurrent Neural Network that liquidates
(Recurrent Neural Networks;Hereinafter referred to as:RNN exemplified by), its topological structure includes two independent RNN and one
Individual connection unit, as shown in figure 3, Fig. 3 is the schematic diagram of topological structure one embodiment in the automatic composing method of the application.Two
Independent RNN, is named as LF_RNN and HF_RNN respectively, is respectively used to low-frequency range multi-frequency feature and combines and high band multi-frequency
Feature is combined.
LF_RNN input is certain frame TmWhen, the energy value E (T since low frequencym,Fi), i=1,2 ..., k, k=1,
2 ..., N/2 (N be FFT points), and upper frequency LF_RNN output Li-1;LF_RNN is output as LiExpression considers low
T after frequency informationmThe energy value of the frequency of frame i-th.
Similarly, HF_RNN input is certain frame TmWhen, the energy value E (T since high frequencym,Fj), j=N/2, N/2-
1 ..., k, wherein k=1,2 ..., N/2 (N counts for FFT), and a upper frequency HF_RNN output Hj+1;HF_RNN output
For HiExpression considers the T after high-frequency informationmThe energy value of frame jth frequency.
Unit is the concatenate in Fig. 3 in succession, and the two is connected into N (T as i=j=km,Fk), examined
The T of other frequency point informations is consideredmThe energy value of frame kth frequency.
Step 204, according to the topological structure of determination and above-mentioned frame level audio frequency characteristics, above-mentioned music frequency band feature is trained to combine
Model.
Specifically, when training music frequency band feature binding model, the training algorithm used can be neutral net mould
Type training algorithm, such as backpropagation (Back Propagation;Hereinafter referred to as:BP) algorithm, training of the present embodiment to use
Algorithm is not construed as limiting.
Fig. 4 is the flow chart of the automatic composing method further embodiment of the application, as shown in figure 4, real shown in the application Fig. 2
Apply in example, step 202 can include:
Step 401, above-mentioned audio file is fixed to the Fast Fourier Transform (FFT) of points by frame.
Specifically, the audio file of " .PCM " form can be fixed to the FFT of points by frame.
Step 402, every frame of above-mentioned audio file is calculated in each Frequency point according to the result of Fast Fourier Transform (FFT)
Energy value.
Fig. 5 is energy value coordinate representation schematic diagram in the automatic composing method of the application, and Fig. 5 gives each frame in each frequency
Energy value coordinate representation schematic diagram, wherein, transverse axis t represents temporal frame, and longitudinal axis f represents Frequency point, and coordinate E (t, f) is represented
Energy value, M represents totalframes, and N represents that FFT counts.
Step 403, determine that the note per frame belongs to according to above-mentioned energy value.
Specifically, in each Frequency point, determine that the first frame and the second frame of above-mentioned audio file belong to first note;So
Judge whether the absolute value of the first difference is less than the second difference afterwards, wherein, above-mentioned first difference is the 3 of above-mentioned audio file
First frame of the energy value of frame and above-mentioned audio file is to the difference of the average value of the second frame energy value, and above-mentioned second difference is above-mentioned
Difference of first frame of audio file to the maxima and minima of the second frame energy value;If it is, determining above-mentioned audio file
The 3rd frame belong to first note, then judge backward successively the 4th frame until last frame note belong to.
If the absolute value of above-mentioned first difference is more than or equal to the second difference, the 3rd frame of above-mentioned audio file is made
For the beginning of second note, and determine that the 4th frame of above-mentioned audio file belongs to second note;From above-mentioned audio file
5th frame starts to judge whether the absolute value of the 3rd difference is less than the 4th difference, and above-mentioned 3rd difference is the of above-mentioned audio file
The difference of average value of 3rd frame of the energy value of five frames and above-mentioned audio file to the 4th frame energy value, above-mentioned 4th difference is upper
The 3rd frame of audio file is stated to the difference of the maxima and minima of the 4th frame energy value;According to judging that the note of the 3rd frame is returned
Category identical mode determines the note ownership of the 5th frame, by that analogy, until by the note of the last frame of above-mentioned audio file
Ownership determination is finished.
That is, determining that the note ownership per frame can be:Each Frequency point is handled as follows:By T1And T2Frame
Think to belong to first note, from T3If frame starts to judge ownership --- meet E (T3, F1)-Emean(T1, T2)|<(Emax
(T1,T2)-Emin(T1,T2)), then T3Frame belongs to first note, then judges the ownership per frame backward successively, wherein, Emean
(T1, T2)、Emax(T1,T2) and Emin(T1,T2) T is represented respectively1To T2Average value, maximum and the minimum value of frame energy value;It is no
Then by T3Frame as second note beginning, and determine T4Frame belongs to second note, from T5Frame starts judgement, still
It is by formula | E (T5, F1)-Emean(T3, T4)|<(Emax(T3,T4)-Emin(T3,T4)) determine that the note of T5 frames belongs to, directly
Note ownership determination to all frames is finished.
Step 404, the energy value of each note is calculated, frame level audio frequency characteristics are obtained according to the energy value of each note.
Fig. 6 is the flow chart of the automatic composing method further embodiment of the application, as shown in fig. 6, real shown in the application Fig. 4
Apply in example, step 404 can include:
Step 601, the average energy value of all frames contained by each note is calculated, the energy value of each note is used as.
Step 602, the energy value of every frame included by each note is normalized to the energy value of affiliated note.
Step 603, the note that energy value is less than predetermined threshold is filtered out, to obtain frame level audio frequency characteristics.
Wherein, above-mentioned predetermined threshold according to systematic function and/or can realize the sets itselfs such as demand when implementing,
The present embodiment is not construed as limiting to the size of above-mentioned predetermined threshold.
In the present embodiment, the energy for defining a note is the average energy value of all frames contained by the note, thus can be with
The average energy value of all frames contained by each note is calculated, as the energy value E (i) of each note, then by included by each note
Every frame energy value be normalized to belonging to note energy value.Further, can also after the energy value of each note is calculated,
Too small energy value is filtered out according to note average energy value Emean, the less note of these energy values is probably noise.Namely
Say, for each E (i), if E (i)<α Emean, then can be set to 0 by the energy value of the note, wherein, on α Emean are
Predetermined threshold is stated, α values can determine that the present embodiment is not construed as limiting to this according to practical situations.
It should be noted that in the application embodiment illustrated in fig. 2, step 201~step 204 can be with step 101~step
Rapid 102 successively perform, and can also parallel be performed with step 101~step 102, the embodiment of the present application is not construed as limiting to this.
Fig. 7 is the flow chart of the automatic composing method further embodiment of the application, as shown in fig. 7, real shown in the application Fig. 1
Apply in example, before step 104, can also include:
Step 701, the topological structure of music forecast model is determined.
In the present embodiment, above-mentioned music forecast model uses RNN models, is wrirted music automatically as shown in figure 8, Fig. 8 is the application
The schematic diagram of another embodiment of topological structure in method, the input of the RNN models shown in Fig. 8 is music frequency band feature combination mould
Output N (the T of typem,Fk), and previous frame model output hm, it is output as the energy value N (T of next framem+1,Fk)。
Step 702, according to the output of above-mentioned music frequency band feature binding model, and the topological structure determined, in training
State music forecast model.
, can also be with it should be noted that step 701 and step 702 can successively be performed with step 101~step 103
Step 101~step 103 is performed parallel, and the present embodiment is not construed as limiting to this.
Above-mentioned automatic composing method can realize automatic composition, and then can improve the efficiency and feasibility wrirted music automatically,
Influence of the subjective factor to wrirting music automatically is reduced, is a kind of brand-new automatic composing method, solves present in prior art
Efficiency is low, poor feasibility and the problems such as big subjective impact.
Fig. 9 is that the application wrirtes music the structural representation of device one embodiment automatically, the automatic composition dress in the present embodiment
Put can as terminal device, or terminal device a part, realize the application provide automatic composing method.Wherein, on
It can be client device to state terminal device, or server device, the application does not make to the form of above-mentioned terminal device
Limit.
As shown in figure 9, above-mentioned automatic composition device can include:Receiving module 91, extraction module 92 and acquisition module 93;
Wherein, receiving module 91, the music file for receiving leading portion music to be predicted, above-mentioned leading portion music to be predicted
Music file includes the voice data or music description information of above-mentioned leading portion music to be predicted;Wherein, above-mentioned leading portion sound to be predicted
Happy voice data or music description information refers to giving the voice data or music description information of a bit of music, then
Music below can be just predicted according to the voice data or music description information of given a bit of music.Above-mentioned music is retouched
Voice data can be typically converted to by stating information, and above-mentioned music description information can be MIDI files etc..
Extraction module 92, the frame level audio frequency characteristics of the music file correspondence music for extracting the reception of receiving module 91;
Module 93 is obtained, for according to above-mentioned frame level audio frequency characteristics and the music frequency band feature binding model built in advance,
Obtain the frame level audio frequency characteristics for carrying band information;And according to the frame level audio frequency characteristics and advance structure of above-mentioned carrying band information
The music forecast model built, obtains the music predicted, to realize automatic composition.
In above-mentioned automatic composition device, receiving module 91 is received after the music file of leading portion music to be predicted, extracts mould
Block 92 extracts the frame level audio frequency characteristics of above-mentioned music file correspondence music, then obtains module 93 according to above-mentioned frame level audio frequency characteristics
The music frequency band feature binding model built in advance, obtains the frame level audio frequency characteristics for carrying band information, and according to above-mentioned
The music forecast model for carrying the frame level audio frequency characteristics of band information and building in advance, obtains the music predicted, so as to
Automatic composition is realized, and then the efficiency and feasibility wrirted music automatically can be improved, subjective factor is reduced to the shadow wrirted music automatically
Ring.
Figure 10 is that the application wrirtes music the structural representation of another embodiment of device automatically, with the automatic composition shown in Fig. 9
Device is compared, and difference is, the automatic composition device shown in Figure 10 can also include:Collection module 94, modular converter 95,
Determining module 96 and training module 97;
Collection module 94, for before the frame level audio frequency characteristics of the acquisition carrying band information of module 93 are obtained, collecting sound
Music file;
Modular converter 95, the music file for collection module 94 to be collected is converted to the audio file of same format;
Specifically, collection module 94 can obtain a large amount of training datas, above-mentioned sound by crawling a large amount of music files in internet
Music file can be voice data or music description information, for example:MIDI files etc..Then modular converter 95 can be with
Above-mentioned music file is converted to the audio file of same format, the form of above-mentioned audio file, which need to only be met, can carry out FFT
, for example:" .PCM " or " .WAV " etc., the present embodiment are not construed as limiting to the form of above-mentioned audio file, the present embodiment with
Illustrated exemplified by " .PCM " form.It should be noted that:If above-mentioned music file is music description information, such as MIDI texts
Part, then need that MIDI files first are converted into audio file, be reconverted into the audio file of " .PCM " form.
Extraction module 92, is additionally operable to extract the frame level audio frequency characteristics for the audio file that modular converter 95 is changed.
Determining module 96, the topological structure for determining music frequency band feature binding model;Specifically, it is determined that module 96 is true
Fixed topological structure is a neural network structure liquidated, and the present embodiment is by taking the RNN that liquidates as an example, and its topological structure includes two
RNN and independent connection unit, as shown in figure 3, two independent RNN, are named as LF_RNN and HF_RNN respectively, respectively
Combined for low-frequency range multi-frequency feature and high band multi-frequency feature is combined.
LF_RNN input is certain frame TmWhen, the energy value E (T since low frequencym,Fi), i=1,2 ..., k, k=1,
2 ..., N/2 (N be FFT points), and upper frequency LF_RNN output Li-1;LF_RNN is output as LiExpression considers low
T after frequency informationmThe energy value of the frequency of frame i-th.
Similarly, HF_RNN input is certain frame TmWhen, the energy value E (T since high frequencym,Fj), j=N/2, N/2-
1 ..., k, wherein k=1,2 ..., N/2 (N counts for FFT), and a upper frequency HF_RNN output Hj+1;HF_RNN output
For HiExpression considers the T after high-frequency informationmThe energy value of frame jth frequency.
Unit is the concatenate in Fig. 3 in succession, and the two is connected into N (T as i=j=km,Fk), examined
The T of other frequency point informations is consideredmThe energy value of frame kth frequency.
Training module 97, the frame level audio that topological structure and extraction module 92 for being determined according to determining module 96 are extracted
Feature, trains above-mentioned music frequency band feature binding model.Specifically, training module 97 is in training music frequency band feature binding model
When, the training algorithm used can be neural network model training algorithm, such as BP algorithm, and training of the present embodiment to use is calculated
Method is not construed as limiting.
In the present embodiment, extraction module 92 can include:Transformation submodule 921, calculating sub module 922, determination sub-module
923 and acquisition submodule 924;
Wherein, transformation submodule 921, the fast Fourier for above-mentioned audio file to be fixed into points by frame becomes
Change;Specifically, the FFT of points the audio file of " .PCM " form can be fixed by frame in transformation submodule 921.
Calculating sub module 922, for calculating above-mentioned audio text according to the result of the Fast Fourier Transform (FFT) of transformation submodule 921
Energy value of the every frame of part in each Frequency point;Fig. 5 gives schematic diagram of each frame in the energy value coordinate representation of each frequency,
Wherein, transverse axis t represents temporal frame, and longitudinal axis f represents Frequency point, and coordinate E (t, f) represents energy value, and M represents totalframes, and N is represented
FFT counts.
Determination sub-module 923, the energy value for being calculated according to calculating sub module 922 determines that the note per frame belongs to.
Calculating sub module 922, is additionally operable to calculate the energy value of each note;
Acquisition submodule 924, the energy value of each note for being calculated according to calculating sub module 922 obtains frame level audio
Feature.
Wherein, calculating sub module 922, the average energy value specifically for calculating all frames contained by each note, as each
The energy value of note;And the energy value of every frame included by each note is normalized to the energy value of affiliated note;
Acquisition submodule 924, the note of predetermined threshold is less than specifically for filtering out energy value, special to obtain frame level audio
Levy.Wherein, above-mentioned predetermined threshold according to systematic function and/or can realize the sets itselfs, this reality such as demand when implementing
Example is applied to be not construed as limiting the size of above-mentioned predetermined threshold.
In the present embodiment, the energy for defining a note is the average energy value of all frames contained by the note, thus can be with
The average energy value of all frames contained by each note is calculated, as the energy value E (i) of each note, then by included by each note
Every frame energy value be normalized to belonging to note energy value.Further, can also after the energy value of each note is calculated,
Too small energy value is filtered out according to note average energy value Emean, the less note of these energy values is probably noise.Namely
Say, for each E (i), if E (i)<α Emean, then can be set to 0 by the energy value of the note, wherein, on α Emean are
Predetermined threshold is stated, α values can determine that the present embodiment is not construed as limiting to this according to practical situations.
In the present embodiment, determination sub-module 923 can include:Note determining unit 9231 and judging unit 9232;
Note determining unit 9231, in each Frequency point, determining the first frame and the second frame category of above-mentioned audio file
In first note;
Judging unit 9232, for judging whether the absolute value of the first difference is less than the second difference;Above-mentioned first difference is
The energy value of the 3rd frame and the first frame of above-mentioned audio file of above-mentioned audio file to the average value of the second frame energy value difference,
Second difference is the difference of the maxima and minima of the first frame to the second frame energy value of above-mentioned audio file;
Note determining unit 9231, is additionally operable to, when the absolute value of the first difference is less than the second difference, determine above-mentioned audio
3rd frame of file belongs to first note, then judges the 4th frame backward successively until the note of last frame belongs to.
Note determining unit 9231, is additionally operable to when the absolute value of the first difference is more than or equal to the second difference, will be above-mentioned
3rd frame of audio file and determines that the 4th frame of above-mentioned audio file belongs to second sound as the beginning of second note
Symbol;
Judging unit 9232, be additionally operable to judge since the 5th frame of above-mentioned audio file the 3rd difference absolute value whether
Less than the 4th difference, above-mentioned 3rd difference is the energy value of the 5th frame of above-mentioned audio file and the 3rd frame of above-mentioned audio file
To the difference of the average value of the 4th frame energy value, above-mentioned 4th difference is the 3rd frame to the 4th frame energy value of above-mentioned audio file
The difference of maxima and minima;Determine that the note of the 5th frame belongs to according to judging that the note of the 3rd frame belongs to identical mode,
By that analogy, until the note ownership determination of the last frame of above-mentioned audio file is finished.
That is, determination sub-module 923 determines that the note ownership of every frame can be:Each Frequency point is located as follows
Reason:Note determining unit 9231 is by T1And T2Frame thinks to belong to first note, and judging unit 9232 is from T3Frame starts judgement and returned
If category --- meet | E (T3, F1)-Emean(T1, T2)|<(Emax(T1,T2)-Emin(T1,T2)), then T3 frames belong to first
Individual note, then judge the ownership per frame backward successively, wherein, Emean(T1, T2)、Emax(T1,T2) and Emin(T1,T2) represent respectively
T1To T2Average value, maximum and the minimum value of frame energy value;Otherwise by T3Frame as second note beginning, and really
Fixed T4Frame belongs to second note, from T5Frame starts to judge, is still by formula | E (T5, F1)-Emean(T3, T4)|<(Emax
(T3,T4)-Emin(T3,T4)) determine that the note of T5 frames belongs to, until the note ownership determination of all frames is finished.
Further, above-mentioned automatic composition device can also include:Determining module 96 and training module 97;
Determining module 96, for before the music that the acquisition of module 93 is predicted is obtained, determining opening up for music forecast model
Flutter structure;In the present embodiment, the topological structure for the music forecast model that determining module 96 is determined is RNN models, as shown in figure 8,
The input of RNN models is the output N (T of music frequency band feature binding modelm,Fk), and previous frame model output hm, output
For the energy value N (T of next framem+1,Fk)。
Training module 97, is determined for the output according to above-mentioned music frequency band feature binding model, and determining module 96
Topological structure, train above-mentioned music forecast model.
Above-mentioned automatic composition device can realize automatic composition, and then can improve the efficiency and feasibility wrirted music automatically,
Influence of the subjective factor to wrirting music automatically is reduced, is a kind of brand-new automatic composing method, solves present in prior art
Efficiency is low, poor feasibility and the problems such as big subjective impact.
Figure 11 is that the terminal device in the structural representation of the application terminal device one embodiment, the application can be realized
The automatic composing method that the application is provided, above-mentioned terminal device can be client device, or server device, this Shen
Please the form to above-mentioned terminal device is not construed as limiting.Above-mentioned terminal device can include:One or more processors;Storage dress
Put, for storing one or more programs;When said one or multiple programs are by said one or multiple computing devices, make
Obtain said one or multiple processors realize the automatic composing method that the application is provided.
Figure 11 shows the block diagram suitable for being used for the exemplary terminal equipment 12 for realizing the application embodiment.Figure 11 is shown
Terminal device 12 be only an example, should not be to the function of the embodiment of the present application and any limitation using range band.
As shown in figure 11, terminal device 12 is showed in the form of universal computing device.The component of terminal device 12 can be wrapped
Include but be not limited to:One or more processor or processing unit 16, system storage 28, connection different system component (bag
Include system storage 28 and processing unit 16) bus 18.
Bus 18 represents the one or more in a few class bus structures, including memory bus or Memory Controller,
Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.Lift
For example, these architectures include but is not limited to industry standard architecture (Industry Standard
Architecture;Hereinafter referred to as:ISA) bus, MCA (Micro Channel Architecture;Below
Referred to as:MAC) bus, enhanced isa bus, VESA (Video Electronics Standards
Association;Hereinafter referred to as:VESA) local bus and periphery component interconnection (Peripheral Component
Interconnection;Hereinafter referred to as:PCI) bus.
Terminal device 12 typically comprises various computing systems computer-readable recording medium.These media can be it is any can be by end
The usable medium that end equipment 12 is accessed, including volatibility and non-volatile media, moveable and immovable medium.
System storage 28 can include the computer system readable media of form of volatile memory, such as arbitrary access
Memory (Random Access Memory;Hereinafter referred to as:RAM) 30 and/or cache memory 32.Terminal device 12 can
To further comprise other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only as act
Example, storage system 34 can be used for the immovable, non-volatile magnetic media of read-write, and (Figure 11 does not show that commonly referred to as " hard disk drives
Dynamic device ").Although not shown in Figure 11, can provide for the magnetic to may move non-volatile magnetic disk (such as " floppy disk ") read-write
Disk drive, and to removable anonvolatile optical disk (for example:Compact disc read-only memory (Compact Disc Read Only
Memory;Hereinafter referred to as:CD-ROM), digital multi read-only optical disc (Digital Video Disc Read Only
Memory;Hereinafter referred to as:DVD-ROM) or other optical mediums) read-write CD drive.In these cases, each driving
Device can be connected by one or more data media interfaces with bus 18.Memory 28 can include the production of at least one program
Product, the program product has one group of (for example, at least one) program module, and it is each that these program modules are configured to perform the application
The function of embodiment.
Program/utility 40 with one group of (at least one) program module 42, can be stored in such as memory 28
In, such program module 42 includes --- but being not limited to --- operating system, one or more application program, other programs
The realization of network environment is potentially included in each or certain combination in module and routine data, these examples.Program mould
Block 42 generally performs the automatic composing method in embodiments described herein.
Terminal device 12 can also be with one or more external equipments 14 (such as keyboard, sensing equipment, display 24)
Communication, can also enable a user to the equipment communication interacted with the terminal device 12 with one or more, and/or with causing the end
Any equipment (such as network interface card, modem etc.) that end equipment 12 can be communicated with one or more of the other computing device
Communication.This communication can be carried out by input/output (I/O) interface 22.Also, terminal device 12 can also be suitable by network
Orchestration 20 and one or more network (such as LAN (Local Area Network;Hereinafter referred to as:LAN), wide area network
(Wide Area Network;Hereinafter referred to as:WAN) and/or public network, such as internet) communication.As shown in figure 11, network
Adapter 20 is communicated by bus 18 with other modules of terminal device 12.Although it should be understood that not shown in Figure 11, Ke Yijie
Close terminal device 12 and use other hardware and/or software module, include but is not limited to:Microcode, device driver, redundancy processing
Unit, external disk drive array, RAID system, tape drive and data backup storage system etc..
Processing unit 16 is stored in program in system storage 28 by operation, thus perform various function application and
Data processing, for example, realize the automatic composing method that the application is provided.
It should be noted that in the description of the present application, term " first ", " second " etc. are only used for describing purpose, without
It is understood that to indicate or imply relative importance.In addition, in the description of the present application, unless otherwise indicated, the implication of " multiple "
It is two or more.
Any process described otherwise above or method description are construed as in flow chart or herein, represent to include
Module, fragment or the portion of the code of one or more executable instructions for the step of realizing specific logical function or process
Point, and the scope of the preferred embodiment of the application includes other realization, wherein can not be by shown or discussion suitable
Sequence, including according to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be by the application
Embodiment person of ordinary skill in the field understood.
It should be appreciated that each several part of the application can be realized with hardware, software, firmware or combinations thereof.Above-mentioned
In embodiment, the software that multiple steps or method can in memory and by suitable instruction execution system be performed with storage
Or firmware is realized.If, and in another embodiment, can be with well known in the art for example, realized with hardware
Any one of row technology or their combination are realized:With the logic gates for realizing logic function to data-signal
Discrete logic, the application specific integrated circuit with suitable combinational logic gate circuit, programmable gate array
(Programmable Gate Array;Hereinafter referred to as:PGA), field programmable gate array (Field Programmable
Gate Array;Hereinafter referred to as:FPGA) etc..
The application also provides a kind of storage medium for including computer executable instructions, and above computer executable instruction exists
For performing the automatic composing method that the application is provided when being performed by computer processor.
The above-mentioned storage medium comprising computer executable instructions can use one or more computer-readable media
Any combination.Computer-readable medium can be computer-readable signal media or computer-readable recording medium.Calculate
Machine readable storage medium storing program for executing for example may be-but not limited to-electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system,
Device or device, or any combination above.The more specifically example (non exhaustive list) of computer-readable recording medium
Including:Electrical connection, portable computer diskette, hard disk, random access memory (RAM) with one or more wires, only
Read memory (Read Only Memory;Hereinafter referred to as:ROM), erasable programmable read only memory (Erasable
Programmable Read Only Memory;Hereinafter referred to as:) or flash memory, optical fiber, portable compact disc are read-only deposits EPROM
Reservoir (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer
Readable storage medium storing program for executing can be it is any include or storage program tangible medium, the program can be commanded execution system, device
Or device is used or in connection.
Computer-readable signal media can be included in a base band or as the data-signal of carrier wave part propagation,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but
It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be
Any computer-readable medium beyond computer-readable recording medium, the computer-readable medium can send, propagate or
Transmit for being used or program in connection by instruction execution system, device or device.
The program code included on computer-readable medium can be transmitted with any appropriate medium, including --- but do not limit
In --- wireless, electric wire, optical cable, RF etc., or above-mentioned any appropriate combination.
The computer for performing the application operation can be write with one or more programming languages or its combination
Program code, described program design language includes object oriented program language-such as Java, Smalltalk, C++,
Also including conventional procedural programming language-such as " C " language or similar programming language.Program code can be with
Fully perform, partly perform on the user computer on the user computer, as independent software kit execution, a portion
Divide part execution or the execution completely on remote computer or server on the remote computer on the user computer.
It is related in the situation of remote computer, remote computer can be by the network of any kind --- including LAN (Local
Area Network;Hereinafter referred to as:) or wide area network (Wide Area Network LAN;Hereinafter referred to as:WAN) it is connected to user
Computer, or, it may be connected to outer computer (such as using ISP come by Internet connection).
Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method is carried
Rapid to can be by program to instruct the hardware of correlation to complete, described program can be stored in a kind of computer-readable storage medium
In matter, the program upon execution, including one or a combination set of the step of embodiment of the method.
In addition, each functional module in each embodiment of the application can be integrated in a processing module or
Modules are individually physically present, can also two or more modules be integrated in a module.Above-mentioned integrated module
Both it can be realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.If the integrated module
Using in the form of software function module realize and as independent production marketing or in use, a computer can also be stored in can
Read in storage medium.
Storage medium mentioned above can be read-only storage, disk or CD etc..
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show
The description of example " or " some examples " etc. means to combine specific features, structure, material or the spy that the embodiment or example are described
Point is contained at least one embodiment of the application or example.In this manual, to the schematic representation of above-mentioned term not
Necessarily refer to identical embodiment or example.Moreover, specific features, structure, material or the feature of description can be any
One or more embodiments or example in combine in an appropriate manner.
Although embodiments herein has been shown and described above, it is to be understood that above-described embodiment is example
Property, it is impossible to the limitation to the application is interpreted as, one of ordinary skill in the art within the scope of application can be to above-mentioned
Embodiment is changed, changed, replacing and modification.
Claims (16)
1. a kind of automatic composing method, it is characterised in that including:
Receive the music file of leading portion music to be predicted, the music file of the leading portion music to be predicted include it is described it is to be predicted before
Duan Yinle voice data or music description information;
Extract the frame level audio frequency characteristics of the music file correspondence music;
According to the frame level audio frequency characteristics and the music frequency band feature binding model built in advance, the frame for carrying band information is obtained
Level audio frequency characteristics;
According to the frame level audio frequency characteristics of the carrying band information and the music forecast model built in advance, the sound predicted is obtained
It is happy, to realize automatic composition.
2. according to the method described in claim 1, it is characterised in that described to build according to the frame level audio frequency characteristics and in advance
Before music frequency band feature binding model, the frame level audio frequency characteristics for obtaining carrying band information, in addition to:
Music file is collected, and the music file is converted to the audio file of same format;
Extract the frame level audio frequency characteristics of the audio file;
Determine the topological structure of music frequency band feature binding model;
According to the topological structure of determination and the frame level audio frequency characteristics, the music frequency band feature binding model is trained.
3. method according to claim 2, it is characterised in that the frame level audio frequency characteristics bag of the extraction audio file
Include:
The audio file is fixed to the Fast Fourier Transform (FFT) of points by frame;
Energy value of every frame in each Frequency point of the audio file is calculated according to the result of Fast Fourier Transform (FFT);
Determine that the note per frame belongs to according to the energy value;
The energy value of each note is calculated, frame level audio frequency characteristics are obtained according to the energy value of each note.
4. method according to claim 3, it is characterised in that described to determine that the note per frame belongs to according to the energy value
Including:
In each Frequency point, determine that the first frame and the second frame of the audio file belong to first note;
Judge whether the absolute value of the first difference is less than the second difference;First difference is the 3rd frame of the audio file
First frame of energy value and the audio file is to the difference of the average value of the second frame energy value, and second difference is the audio
Difference of first frame of file to the maxima and minima of the second frame energy value;
If it is, determine that the 3rd frame of the audio file belongs to first note, then judge the 4th frame backward successively until
The note ownership of last frame.
5. method according to claim 4, it is characterised in that whether the absolute value for judging the first difference is less than second
After the absolute value of difference, in addition to:
If the absolute value of first difference is more than or equal to second difference, the 3rd frame of the audio file is made
For the beginning of second note, and determine that the 4th frame of the audio file belongs to second note;
Judge whether the absolute value of the 3rd difference is less than the 4th difference since the 5th frame of the audio file, the described 3rd is poor
It is worth the average value of the energy value of the 5th frame for the audio file and the 3rd frame to the 4th frame energy value of the audio file
Difference, the 4th difference for maxima and minima of the 3rd frame to the 4th frame energy value of the audio file difference;Directly
Finished to by the note ownership determination of the last frame of the audio file.
6. method according to claim 3, it is characterised in that the energy value of each note of calculating, according to each sound
The energy value of symbol, which obtains frame level audio frequency characteristics, to be included:
The average energy value of all frames contained by each note is calculated, the energy value of each note is used as;
The energy value of every frame included by each note is normalized to the energy value of affiliated note;
The note that energy value is less than predetermined threshold is filtered out, to obtain frame level audio frequency characteristics.
7. according to the method described in claim 1, it is characterised in that the frame level audio according to the carrying band information is special
Seek peace the music forecast model built in advance, obtain before the music predicted, can also include:
Determine the topological structure of music forecast model;
According to the output of the music frequency band feature binding model, and the topological structure determined, train the music prediction mould
Type.
8. a kind of automatic composition device, it is characterised in that including:
Receiving module, the music file for receiving leading portion music to be predicted, the music file bag of the leading portion music to be predicted
Include the voice data or music description information of the leading portion music to be predicted;
Extraction module, the frame level audio frequency characteristics for extracting the music file correspondence music that the receiving module is received;
Module is obtained, for according to the frame level audio frequency characteristics and the music frequency band feature binding model built in advance, being taken
Frame level audio frequency characteristics with band information;And according to the frame level audio frequency characteristics for carrying band information and the sound built in advance
Happy forecast model, obtains the music predicted, to realize automatic composition.
9. device according to claim 8, it is characterised in that also include:Collection module, modular converter, determining module and
Training module;
The collection module, for before the acquisition module obtains the frame level audio frequency characteristics for carrying band information, collecting sound
Music file;
The modular converter, the music file for the collection module to be collected is converted to the audio file of same format;
The extraction module, is additionally operable to extract the frame level audio frequency characteristics of the audio file of the modular converter conversion;
The determining module, the topological structure for determining music frequency band feature binding model;
The training module, the frame level sound extracted for the topological structure determined according to the determining module and the extraction module
Frequency feature, trains the music frequency band feature binding model.
10. device according to claim 9, it is characterised in that the extraction module includes:
Transformation submodule, the Fast Fourier Transform (FFT) for the audio file to be fixed to points by frame;
Calculating sub module, for calculating the every of the audio file according to the result of the transformation submodule Fast Fourier Transform (FFT)
Energy value of the frame in each Frequency point;
Determination sub-module, the energy value for being calculated according to the calculating sub module determines that the note per frame belongs to;
The calculating sub module, is additionally operable to calculate the energy value of each note;
Acquisition submodule, the energy value of each note for being calculated according to the calculating sub module obtains frame level audio frequency characteristics.
11. device according to claim 10, it is characterised in that the determination sub-module includes:
Note determining unit, in each Frequency point, determining that the first frame and the second frame of the audio file belong to first
Note;
Judging unit, for judging whether the absolute value of the first difference is less than the second difference;First difference is the audio
The energy value of the 3rd frame and the first frame of the audio file of file to the average value of the second frame energy value difference, described second
Difference is the difference of the maxima and minima of the first frame to the second frame energy value of the audio file;
The note determining unit, is additionally operable to, when the absolute value of first difference is less than the second difference, determine the audio
3rd frame of file belongs to first note, then judges the 4th frame backward successively until the note of last frame belongs to.
12. device according to claim 11, it is characterised in that
The note determining unit, is additionally operable to when the absolute value of first difference is more than or equal to second difference, will
3rd frame of the audio file and determines that the 4th frame of the audio file belongs to described as the beginning of second note
Two notes;
The judging unit, is additionally operable to judge whether the absolute value of the 3rd difference is less than since the 5th frame of the audio file
4th difference, the 3rd difference for the audio file the energy value of the 5th frame and the 3rd frame of the audio file to the
The difference of the average value of four frame energy values, the 4th difference is the maximum of the 3rd frame to the 4th frame energy value of the audio file
The difference of value and minimum value;Until the note ownership determination of the last frame of the audio file is finished.
13. device according to claim 10, it is characterised in that
The calculating sub module, the average energy value specifically for calculating all frames contained by each note, is used as the energy of each note
Value;And the energy value of every frame included by each note is normalized to the energy value of affiliated note;
The acquisition submodule, the note of predetermined threshold is less than specifically for filtering out energy value, to obtain frame level audio frequency characteristics.
14. device according to claim 8, it is characterised in that also include:Determining module and training module;
The determining module, for before the acquisition module obtains the music predicted, determining opening up for music forecast model
Flutter structure;
The training module, is determined for the output according to the music frequency band feature binding model, and the determining module
Topological structure, train the music forecast model.
15. a kind of terminal device, it is characterised in that including:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are by one or more of computing devices so that one or more of processors
Realize the method as described in any in claim 1-7.
16. a kind of storage medium for including computer executable instructions, the computer executable instructions are by computer disposal
For performing the method as described in any in claim 1-7 when device is performed.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710175115.8A CN107045867B (en) | 2017-03-22 | 2017-03-22 | Automatic composition method and device and terminal equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710175115.8A CN107045867B (en) | 2017-03-22 | 2017-03-22 | Automatic composition method and device and terminal equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107045867A true CN107045867A (en) | 2017-08-15 |
CN107045867B CN107045867B (en) | 2020-06-02 |
Family
ID=59544865
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710175115.8A Active CN107045867B (en) | 2017-03-22 | 2017-03-22 | Automatic composition method and device and terminal equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107045867B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108538301A (en) * | 2018-02-13 | 2018-09-14 | 吟飞科技(江苏)有限公司 | A kind of intelligent digital musical instrument based on neural network Audiotechnica |
CN109192187A (en) * | 2018-06-04 | 2019-01-11 | 平安科技(深圳)有限公司 | Composing method, system, computer equipment and storage medium based on artificial intelligence |
CN109285560A (en) * | 2018-09-28 | 2019-01-29 | 北京奇艺世纪科技有限公司 | A kind of music features extraction method, apparatus and electronic equipment |
CN109727590A (en) * | 2018-12-24 | 2019-05-07 | 成都嗨翻屋科技有限公司 | Music generating method and device based on Recognition with Recurrent Neural Network |
CN109872709A (en) * | 2019-03-04 | 2019-06-11 | 湖南工程学院 | A kind of new bent generation method of the low similarity based on note complex network |
CN110660375A (en) * | 2018-06-28 | 2020-01-07 | 北京搜狗科技发展有限公司 | Method, device and equipment for generating music |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040089142A1 (en) * | 2002-11-12 | 2004-05-13 | Alain Georges | Systems and methods for creating, modifying, interacting with and playing musical compositions |
US20110161078A1 (en) * | 2007-03-01 | 2011-06-30 | Microsoft Corporation | Pitch model for noise estimation |
CN104050972A (en) * | 2013-03-14 | 2014-09-17 | 雅马哈株式会社 | Sound signal analysis apparatus, sound signal analysis method and sound signal analysis program |
CN104282300A (en) * | 2013-07-05 | 2015-01-14 | 中国移动通信集团公司 | Non-periodic component syllable model building and speech synthesizing method and device |
CN105374347A (en) * | 2015-09-22 | 2016-03-02 | 中国传媒大学 | A mixed algorithm-based computer-aided composition method for popular tunes in regions south of the Yangtze River |
US20170046973A1 (en) * | 2015-10-27 | 2017-02-16 | Thea Kuddo | Preverbal elemental music: multimodal intervention to stimulate auditory perception and receptive language acquisition |
-
2017
- 2017-03-22 CN CN201710175115.8A patent/CN107045867B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040089142A1 (en) * | 2002-11-12 | 2004-05-13 | Alain Georges | Systems and methods for creating, modifying, interacting with and playing musical compositions |
US20110161078A1 (en) * | 2007-03-01 | 2011-06-30 | Microsoft Corporation | Pitch model for noise estimation |
CN104050972A (en) * | 2013-03-14 | 2014-09-17 | 雅马哈株式会社 | Sound signal analysis apparatus, sound signal analysis method and sound signal analysis program |
CN104282300A (en) * | 2013-07-05 | 2015-01-14 | 中国移动通信集团公司 | Non-periodic component syllable model building and speech synthesizing method and device |
CN105374347A (en) * | 2015-09-22 | 2016-03-02 | 中国传媒大学 | A mixed algorithm-based computer-aided composition method for popular tunes in regions south of the Yangtze River |
US20170046973A1 (en) * | 2015-10-27 | 2017-02-16 | Thea Kuddo | Preverbal elemental music: multimodal intervention to stimulate auditory perception and receptive language acquisition |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108538301A (en) * | 2018-02-13 | 2018-09-14 | 吟飞科技(江苏)有限公司 | A kind of intelligent digital musical instrument based on neural network Audiotechnica |
CN108538301B (en) * | 2018-02-13 | 2021-05-07 | 吟飞科技(江苏)有限公司 | Intelligent digital musical instrument based on neural network audio technology |
CN109192187A (en) * | 2018-06-04 | 2019-01-11 | 平安科技(深圳)有限公司 | Composing method, system, computer equipment and storage medium based on artificial intelligence |
CN110660375A (en) * | 2018-06-28 | 2020-01-07 | 北京搜狗科技发展有限公司 | Method, device and equipment for generating music |
CN110660375B (en) * | 2018-06-28 | 2024-06-04 | 北京搜狗科技发展有限公司 | Method, device and equipment for generating music |
CN109285560A (en) * | 2018-09-28 | 2019-01-29 | 北京奇艺世纪科技有限公司 | A kind of music features extraction method, apparatus and electronic equipment |
CN109285560B (en) * | 2018-09-28 | 2021-09-03 | 北京奇艺世纪科技有限公司 | Music feature extraction method and device and electronic equipment |
CN109727590A (en) * | 2018-12-24 | 2019-05-07 | 成都嗨翻屋科技有限公司 | Music generating method and device based on Recognition with Recurrent Neural Network |
CN109872709A (en) * | 2019-03-04 | 2019-06-11 | 湖南工程学院 | A kind of new bent generation method of the low similarity based on note complex network |
Also Published As
Publication number | Publication date |
---|---|
CN107045867B (en) | 2020-06-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107045867A (en) | Automatic composing method, device and terminal device | |
CN107221326B (en) | Voice awakening method and device based on artificial intelligence and computer equipment | |
US10867618B2 (en) | Speech noise reduction method and device based on artificial intelligence and computer device | |
CN107274906A (en) | Voice information processing method, device, terminal and storage medium | |
KR102128926B1 (en) | Method and device for processing audio information | |
CN105702250B (en) | Speech recognition method and device | |
CN108573694A (en) | Language material expansion and speech synthesis system construction method based on artificial intelligence and device | |
CN107134279A (en) | A kind of voice awakening method, device, terminal and storage medium | |
CN108288468A (en) | Audio recognition method and device | |
CN109036396A (en) | A kind of exchange method and system of third-party application | |
CN111081280B (en) | Text-independent speech emotion recognition method and device and emotion recognition algorithm model generation method | |
CN108269567A (en) | For generating the method, apparatus of far field voice data, computing device and computer readable storage medium | |
CN108281138A (en) | Age discrimination model training and intelligent sound exchange method, equipment and storage medium | |
WO2023116660A2 (en) | Model training and tone conversion method and apparatus, device, and medium | |
CN108922564A (en) | Emotion identification method, apparatus, computer equipment and storage medium | |
CN112017650B (en) | Voice control method and device of electronic equipment, computer equipment and storage medium | |
CN110459207A (en) | Wake up the segmentation of voice key phrase | |
CN109785846A (en) | The role recognition method and device of the voice data of monophonic | |
CN107978308A (en) | Karaoke scoring method, device, equipment and storage medium | |
EP4033483A2 (en) | Method and apparatus for testing vehicle-mounted voice device, electronic device and storage medium | |
CN112489623A (en) | Language identification model training method, language identification method and related equipment | |
CN107316635A (en) | Audio recognition method and device, storage medium, electronic equipment | |
CN113053410B (en) | Voice recognition method, voice recognition device, computer equipment and storage medium | |
CN112420079B (en) | Voice endpoint detection method and device, storage medium and electronic equipment | |
CN112562725A (en) | Mixed voice emotion classification method based on spectrogram and capsule network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |