CN107045867A

CN107045867A - Automatic composing method, device and terminal device

Info

Publication number: CN107045867A
Application number: CN201710175115.8A
Authority: CN
Inventors: 何江聪; 潘青华; 胡国平; 胡郁; 刘庆峰
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2017-03-22
Filing date: 2017-03-22
Publication date: 2017-08-15
Anticipated expiration: 2037-03-22
Also published as: CN107045867B

Abstract

The application proposes a kind of automatic composing method, device and terminal device, and above-mentioned automatic composing method includes：The music file of leading portion music to be predicted is received, the music file of the leading portion music to be predicted includes the voice data or music description information of the leading portion music to be predicted；Extract the frame level audio frequency characteristics of the music file correspondence music；According to the frame level audio frequency characteristics and the music frequency band feature binding model built in advance, the frame level audio frequency characteristics for carrying band information are obtained；According to the frame level audio frequency characteristics of the carrying band information and the music forecast model built in advance, the music predicted is obtained, to realize automatic composition.The application can realize automatic composition, and then can improve the efficiency and feasibility wrirted music automatically, reduce influence of the subjective factor to wrirting music automatically.

Description

Automatic composing method, device and terminal device

Technical field

The application is related to Audio Signal Processing technical field, more particularly to a kind of automatic composing method, device and terminal are set It is standby.

Background technology

With application of the computer technology in music processing, Computer Music is arisen at the historic moment.Computer Music is as new Raw generation art, gradually penetrates into the various aspects such as creation, instrument playing, education, the amusement of music.Using artificial intelligence technology Progress is wrirted music as research direction newer in Computer Music automatically, and the height of association area researcher is received in recent years Pay attention to.

The existing automatic composing method based on artificial intelligence technology mainly has following two：Based on heuristic search from Action song and the automatic composition based on genetic algorithm.But, the existing automatic composition based on heuristic search is only applicable to pleasure Situation short Qu Changdu, its search efficiency is added to exponential decline with length of audio track, thus for the longer pleasure of length The poor feasibility of bent this method；And the automatic composing method based on genetic algorithm inherits some exemplary shortcomings of genetic algorithm, For example：, genetic operator big to initial population dependence is difficult to precisely selected etc..

The content of the invention

The purpose of the application is intended at least solve one of technical problem in correlation technique to a certain extent.

Therefore, first purpose of the application is to propose a kind of automatic composing method.This method is by building music frequency Band feature binding model and music forecast model, realize automatic composition, are a kind of brand-new automatic composing methods, solve existing Efficiency present in technology is low, poor feasibility, big subjective impact the problems such as.

Second purpose of the application is to propose a kind of automatic composition device.

The 3rd purpose of the application is to propose a kind of terminal device.

The 4th purpose of the application is to propose a kind of storage medium for including computer executable instructions.

To achieve these goals, the automatic composing method of the application first aspect embodiment, including：Before reception is to be predicted Duan Yinle music file, the music file of the leading portion music to be predicted includes the voice data of the leading portion music to be predicted Or music description information；Extract the frame level audio frequency characteristics of the music file correspondence music；According to the frame level audio frequency characteristics and The music frequency band feature binding model built in advance, obtains the frame level audio frequency characteristics for carrying band information；Frequency is carried according to described The frame level audio frequency characteristics of information and the music forecast model built in advance, obtain the music predicted, to realize automatic composition.

In the automatic composing method of the embodiment of the present application, after the music file for receiving leading portion music to be predicted, in extraction The frame level audio frequency characteristics of music file correspondence music are stated, then according to above-mentioned frame level audio frequency characteristics and the music frequency band built in advance Feature binding model, obtains the frame level audio frequency characteristics for carrying band information, finally according to the frame level sound of above-mentioned carrying band information Frequency feature and the music forecast model built in advance, obtain the music that predicts, so as to realize automatic composition, and then can be with The efficiency and feasibility wrirted music automatically are improved, influence of the subjective factor to wrirting music automatically is reduced.

To achieve these goals, the automatic composition device of the application second aspect embodiment, including：Receiving module, is used In the music file for receiving leading portion music to be predicted, the music file of the leading portion music to be predicted includes the leading portion to be predicted The voice data or music description information of music；Extraction module, for extracting the music file correspondence that the receiving module is received The frame level audio frequency characteristics of music；Module is obtained, for according to the frame level audio frequency characteristics and the music frequency band feature built in advance Binding model, obtains the frame level audio frequency characteristics for carrying band information；And it is special according to the frame level audio of the carrying band information Seek peace the music forecast model built in advance, the music predicted is obtained, to realize automatic composition.

In the automatic composition device of the embodiment of the present application, receiving module receive leading portion music to be predicted music file it Afterwards, extraction module extracts the frame level audio frequency characteristics of above-mentioned music file correspondence music, then obtains module according to above-mentioned frame level sound Frequency feature and the music frequency band feature binding model built in advance, obtain the frame level audio frequency characteristics for carrying band information, Yi Jigen According to the frame level audio frequency characteristics and the music forecast model that builds in advance of above-mentioned carrying band information, the music predicted is obtained, from And automatic composition can be realized, and then the efficiency and feasibility wrirted music automatically can be improved, subjective factor is reduced to acting certainly Bent influence.

To achieve these goals, the terminal device of the application third aspect embodiment, including：One or more processing Device；Storage device, for storing one or more programs；When one or more of programs are by one or more of processors During execution so that one or more of processors realize method as described above.

To achieve these goals, the application fourth aspect embodiment provides a kind of depositing comprising computer executable instructions Storage media, the computer executable instructions are used to perform method as described above when being performed by computer processor.

The aspect and advantage that the application is added will be set forth in part in the description, and will partly become from the following description Obtain substantially, or recognized by the practice of the application.

Brief description of the drawings

The above-mentioned and/or additional aspect of the application and advantage will become from the following description of the accompanying drawings of embodiments Substantially and be readily appreciated that, wherein：

Fig. 1 is the flow chart of the automatic composing method one embodiment of the application；

Fig. 2 is the flow chart of automatic another embodiment of composing method of the application；

Fig. 3 is the schematic diagram of topological structure one embodiment in the automatic composing method of the application；

Fig. 4 is the flow chart of the automatic composing method further embodiment of the application；

Fig. 5 is energy value coordinate representation schematic diagram in the automatic composing method of the application；

Fig. 6 is the flow chart of the automatic composing method further embodiment of the application；

Fig. 7 is the flow chart of the automatic composing method further embodiment of the application；

Fig. 8 is the schematic diagram of topological structure another embodiment in the automatic composing method of the application；

Fig. 9 is that the application wrirtes music the structural representation of device one embodiment automatically；

Figure 10 is that the application wrirtes music the structural representation of another embodiment of device automatically；

Figure 11 is the structural representation of the application terminal device one embodiment.

Embodiment

Embodiments herein is described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end Same or similar label represents same or similar element or the element with same or like function.Below with reference to attached The embodiment of figure description is exemplary, is only used for explaining the application, and it is not intended that limitation to the application.On the contrary, this All changes in the range of spirit and intension that the embodiment of application includes falling into attached claims, modification and equivalent Thing.

Fig. 1 is the flow chart of the automatic composing method one embodiment of the application, as shown in figure 1, above-mentioned automatic composing method It can include：

Step 101, the music file of leading portion music to be predicted is received, the music file of above-mentioned leading portion music to be predicted includes The voice data or music description information of above-mentioned leading portion music to be predicted.

Wherein, the voice data or music description information of above-mentioned leading portion music to be predicted refer to giving a bit of music Voice data or music description information, then can just be described according to the voice data or music of given a bit of music Music behind information prediction.

Above-mentioned music description information can typically be converted to voice data, and above-mentioned music description information can be Instrument Digital Interface (Musical Instrument Digital Interface；Hereinafter referred to as：MIDI) file etc..

Step 102, the frame level audio frequency characteristics of above-mentioned music file correspondence music are extracted.

Step 103, according to above-mentioned frame level audio frequency characteristics and the music frequency band feature binding model built in advance, carried The frame level audio frequency characteristics of band information.

Step 104, according to the frame level audio frequency characteristics of above-mentioned carrying band information and the music forecast model built in advance, obtain The music that must be predicted, to realize automatic composition.

In above-mentioned automatic composing method, after the music file for receiving leading portion music to be predicted, above-mentioned music file is extracted The frame level audio frequency characteristics of correspondence music, then according to above-mentioned frame level audio frequency characteristics and the music frequency band feature combination mould built in advance Type, obtains the frame level audio frequency characteristics for carrying band information, frame level audio frequency characteristics finally according to above-mentioned carrying band information and pre- The music forecast model first built, obtains the music predicted, so as to realize automatic composition, and then can improve from action Bent efficiency and feasibility, reduces influence of the subjective factor to wrirting music automatically.

Fig. 2 is the flow chart of automatic another embodiment of composing method of the application, as shown in Fig. 2 before step 103, also It can include：

Step 201, music file is collected, and above-mentioned music file is converted to the audio file of same format.

Specifically, a large amount of training datas can be obtained by crawling a large amount of music files in internet, above-mentioned music file can be with It is voice data or music description information, for example：MIDI files etc..Then above-mentioned music file can be converted to The audio file of same format, the form of above-mentioned audio file, which need to only be met, can carry out Fast Fourier Transform (FFT) (Fast Fourier Transformation；Hereinafter referred to as：FFT), for example：" .PCM " or " .WAV " etc., the present embodiment is to above-mentioned The form of audio file is not construed as limiting, and the present embodiment is illustrated by taking " .PCM " form as an example.It should be noted that：If above-mentioned Music file is music description information, and such as MIDI files then need that MIDI files first are converted into audio file, are reconverted into The audio file of " .PCM " form.

Step 202, the frame level audio frequency characteristics of above-mentioned audio file are extracted.

Step 203, the topological structure of music frequency band feature binding model is determined.

Specifically, topological structure is a neural network structure liquidated, and the present embodiment is with the Recognition with Recurrent Neural Network that liquidates (Recurrent Neural Networks；Hereinafter referred to as：RNN exemplified by), its topological structure includes two independent RNN and one Individual connection unit, as shown in figure 3, Fig. 3 is the schematic diagram of topological structure one embodiment in the automatic composing method of the application.Two Independent RNN, is named as LF_RNN and HF_RNN respectively, is respectively used to low-frequency range multi-frequency feature and combines and high band multi-frequency Feature is combined.

LF_RNN input is certain frame T_mWhen, the energy value E (T since low frequency_m,F_i), i=1,2 ..., k, k=1, 2 ..., N/2 (N be FFT points), and upper frequency LF_RNN output L_i-1；LF_RNN is output as L_iExpression considers low T after frequency information_mThe energy value of the frequency of frame i-th.

Similarly, HF_RNN input is certain frame T_mWhen, the energy value E (T since high frequency_m,F_j), j=N/2, N/2- 1 ..., k, wherein k=1,2 ..., N/2 (N counts for FFT), and a upper frequency HF_RNN output H_j+1；HF_RNN output For H_iExpression considers the T after high-frequency information_mThe energy value of frame jth frequency.

Unit is the concatenate in Fig. 3 in succession, and the two is connected into N (T as i=j=k_m,F_k), examined The T of other frequency point informations is considered_mThe energy value of frame kth frequency.

Step 204, according to the topological structure of determination and above-mentioned frame level audio frequency characteristics, above-mentioned music frequency band feature is trained to combine Model.

Specifically, when training music frequency band feature binding model, the training algorithm used can be neutral net mould Type training algorithm, such as backpropagation (Back Propagation；Hereinafter referred to as：BP) algorithm, training of the present embodiment to use Algorithm is not construed as limiting.

Fig. 4 is the flow chart of the automatic composing method further embodiment of the application, as shown in figure 4, real shown in the application Fig. 2 Apply in example, step 202 can include：

Step 401, above-mentioned audio file is fixed to the Fast Fourier Transform (FFT) of points by frame.

Specifically, the audio file of " .PCM " form can be fixed to the FFT of points by frame.

Step 402, every frame of above-mentioned audio file is calculated in each Frequency point according to the result of Fast Fourier Transform (FFT) Energy value.

Fig. 5 is energy value coordinate representation schematic diagram in the automatic composing method of the application, and Fig. 5 gives each frame in each frequency Energy value coordinate representation schematic diagram, wherein, transverse axis t represents temporal frame, and longitudinal axis f represents Frequency point, and coordinate E (t, f) is represented Energy value, M represents totalframes, and N represents that FFT counts.

Step 403, determine that the note per frame belongs to according to above-mentioned energy value.

Specifically, in each Frequency point, determine that the first frame and the second frame of above-mentioned audio file belong to first note；So Judge whether the absolute value of the first difference is less than the second difference afterwards, wherein, above-mentioned first difference is the 3 of above-mentioned audio file First frame of the energy value of frame and above-mentioned audio file is to the difference of the average value of the second frame energy value, and above-mentioned second difference is above-mentioned Difference of first frame of audio file to the maxima and minima of the second frame energy value；If it is, determining above-mentioned audio file The 3rd frame belong to first note, then judge backward successively the 4th frame until last frame note belong to.

If the absolute value of above-mentioned first difference is more than or equal to the second difference, the 3rd frame of above-mentioned audio file is made For the beginning of second note, and determine that the 4th frame of above-mentioned audio file belongs to second note；From above-mentioned audio file 5th frame starts to judge whether the absolute value of the 3rd difference is less than the 4th difference, and above-mentioned 3rd difference is the of above-mentioned audio file The difference of average value of 3rd frame of the energy value of five frames and above-mentioned audio file to the 4th frame energy value, above-mentioned 4th difference is upper The 3rd frame of audio file is stated to the difference of the maxima and minima of the 4th frame energy value；According to judging that the note of the 3rd frame is returned Category identical mode determines the note ownership of the 5th frame, by that analogy, until by the note of the last frame of above-mentioned audio file Ownership determination is finished.

That is, determining that the note ownership per frame can be：Each Frequency point is handled as follows：By T₁And T₂Frame Think to belong to first note, from T₃If frame starts to judge ownership --- meet E (T₃, F₁)-E_mean(T₁, T₂)|<(E_max (T₁,T₂)-E_min(T₁,T₂)), then T₃Frame belongs to first note, then judges the ownership per frame backward successively, wherein, E_mean (T₁, T₂)、E_max(T₁,T₂) and E_min(T₁,T₂) T is represented respectively₁To T₂Average value, maximum and the minimum value of frame energy value；It is no Then by T₃Frame as second note beginning, and determine T₄Frame belongs to second note, from T₅Frame starts judgement, still It is by formula | E (T₅, F₁)-E_mean(T₃, T₄)|<(E_max(T₃,T₄)-E_min(T₃,T₄)) determine that the note of T5 frames belongs to, directly Note ownership determination to all frames is finished.

Step 404, the energy value of each note is calculated, frame level audio frequency characteristics are obtained according to the energy value of each note.

Fig. 6 is the flow chart of the automatic composing method further embodiment of the application, as shown in fig. 6, real shown in the application Fig. 4 Apply in example, step 404 can include：

Step 601, the average energy value of all frames contained by each note is calculated, the energy value of each note is used as.

Step 602, the energy value of every frame included by each note is normalized to the energy value of affiliated note.

Step 603, the note that energy value is less than predetermined threshold is filtered out, to obtain frame level audio frequency characteristics.

Wherein, above-mentioned predetermined threshold according to systematic function and/or can realize the sets itselfs such as demand when implementing, The present embodiment is not construed as limiting to the size of above-mentioned predetermined threshold.

In the present embodiment, the energy for defining a note is the average energy value of all frames contained by the note, thus can be with The average energy value of all frames contained by each note is calculated, as the energy value E (i) of each note, then by included by each note Every frame energy value be normalized to belonging to note energy value.Further, can also after the energy value of each note is calculated, Too small energy value is filtered out according to note average energy value Emean, the less note of these energy values is probably noise.Namely Say, for each E (i), if E (i)<α Emean, then can be set to 0 by the energy value of the note, wherein, on α Emean are Predetermined threshold is stated, α values can determine that the present embodiment is not construed as limiting to this according to practical situations.

It should be noted that in the application embodiment illustrated in fig. 2, step 201~step 204 can be with step 101~step Rapid 102 successively perform, and can also parallel be performed with step 101~step 102, the embodiment of the present application is not construed as limiting to this.

Fig. 7 is the flow chart of the automatic composing method further embodiment of the application, as shown in fig. 7, real shown in the application Fig. 1 Apply in example, before step 104, can also include：

Step 701, the topological structure of music forecast model is determined.

In the present embodiment, above-mentioned music forecast model uses RNN models, is wrirted music automatically as shown in figure 8, Fig. 8 is the application The schematic diagram of another embodiment of topological structure in method, the input of the RNN models shown in Fig. 8 is music frequency band feature combination mould Output N (the T of type_m,F_k), and previous frame model output h_m, it is output as the energy value N (T of next frame_m+1,F_k)。

Step 702, according to the output of above-mentioned music frequency band feature binding model, and the topological structure determined, in training State music forecast model.

, can also be with it should be noted that step 701 and step 702 can successively be performed with step 101~step 103 Step 101~step 103 is performed parallel, and the present embodiment is not construed as limiting to this.

Above-mentioned automatic composing method can realize automatic composition, and then can improve the efficiency and feasibility wrirted music automatically, Influence of the subjective factor to wrirting music automatically is reduced, is a kind of brand-new automatic composing method, solves present in prior art Efficiency is low, poor feasibility and the problems such as big subjective impact.

Fig. 9 is that the application wrirtes music the structural representation of device one embodiment automatically, the automatic composition dress in the present embodiment Put can as terminal device, or terminal device a part, realize the application provide automatic composing method.Wherein, on It can be client device to state terminal device, or server device, the application does not make to the form of above-mentioned terminal device Limit.

As shown in figure 9, above-mentioned automatic composition device can include：Receiving module 91, extraction module 92 and acquisition module 93；

Wherein, receiving module 91, the music file for receiving leading portion music to be predicted, above-mentioned leading portion music to be predicted Music file includes the voice data or music description information of above-mentioned leading portion music to be predicted；Wherein, above-mentioned leading portion sound to be predicted Happy voice data or music description information refers to giving the voice data or music description information of a bit of music, then Music below can be just predicted according to the voice data or music description information of given a bit of music.Above-mentioned music is retouched Voice data can be typically converted to by stating information, and above-mentioned music description information can be MIDI files etc..

Extraction module 92, the frame level audio frequency characteristics of the music file correspondence music for extracting the reception of receiving module 91；

Module 93 is obtained, for according to above-mentioned frame level audio frequency characteristics and the music frequency band feature binding model built in advance, Obtain the frame level audio frequency characteristics for carrying band information；And according to the frame level audio frequency characteristics and advance structure of above-mentioned carrying band information The music forecast model built, obtains the music predicted, to realize automatic composition.

In above-mentioned automatic composition device, receiving module 91 is received after the music file of leading portion music to be predicted, extracts mould Block 92 extracts the frame level audio frequency characteristics of above-mentioned music file correspondence music, then obtains module 93 according to above-mentioned frame level audio frequency characteristics The music frequency band feature binding model built in advance, obtains the frame level audio frequency characteristics for carrying band information, and according to above-mentioned The music forecast model for carrying the frame level audio frequency characteristics of band information and building in advance, obtains the music predicted, so as to Automatic composition is realized, and then the efficiency and feasibility wrirted music automatically can be improved, subjective factor is reduced to the shadow wrirted music automatically Ring.

Figure 10 is that the application wrirtes music the structural representation of another embodiment of device automatically, with the automatic composition shown in Fig. 9 Device is compared, and difference is, the automatic composition device shown in Figure 10 can also include：Collection module 94, modular converter 95, Determining module 96 and training module 97；

Collection module 94, for before the frame level audio frequency characteristics of the acquisition carrying band information of module 93 are obtained, collecting sound Music file；

Modular converter 95, the music file for collection module 94 to be collected is converted to the audio file of same format；

Specifically, collection module 94 can obtain a large amount of training datas, above-mentioned sound by crawling a large amount of music files in internet Music file can be voice data or music description information, for example：MIDI files etc..Then modular converter 95 can be with Above-mentioned music file is converted to the audio file of same format, the form of above-mentioned audio file, which need to only be met, can carry out FFT , for example：" .PCM " or " .WAV " etc., the present embodiment are not construed as limiting to the form of above-mentioned audio file, the present embodiment with Illustrated exemplified by " .PCM " form.It should be noted that：If above-mentioned music file is music description information, such as MIDI texts Part, then need that MIDI files first are converted into audio file, be reconverted into the audio file of " .PCM " form.

Extraction module 92, is additionally operable to extract the frame level audio frequency characteristics for the audio file that modular converter 95 is changed.

Determining module 96, the topological structure for determining music frequency band feature binding model；Specifically, it is determined that module 96 is true Fixed topological structure is a neural network structure liquidated, and the present embodiment is by taking the RNN that liquidates as an example, and its topological structure includes two RNN and independent connection unit, as shown in figure 3, two independent RNN, are named as LF_RNN and HF_RNN respectively, respectively Combined for low-frequency range multi-frequency feature and high band multi-frequency feature is combined.

Training module 97, the frame level audio that topological structure and extraction module 92 for being determined according to determining module 96 are extracted Feature, trains above-mentioned music frequency band feature binding model.Specifically, training module 97 is in training music frequency band feature binding model When, the training algorithm used can be neural network model training algorithm, such as BP algorithm, and training of the present embodiment to use is calculated Method is not construed as limiting.

In the present embodiment, extraction module 92 can include：Transformation submodule 921, calculating sub module 922, determination sub-module 923 and acquisition submodule 924；

Wherein, transformation submodule 921, the fast Fourier for above-mentioned audio file to be fixed into points by frame becomes Change；Specifically, the FFT of points the audio file of " .PCM " form can be fixed by frame in transformation submodule 921.

Calculating sub module 922, for calculating above-mentioned audio text according to the result of the Fast Fourier Transform (FFT) of transformation submodule 921 Energy value of the every frame of part in each Frequency point；Fig. 5 gives schematic diagram of each frame in the energy value coordinate representation of each frequency, Wherein, transverse axis t represents temporal frame, and longitudinal axis f represents Frequency point, and coordinate E (t, f) represents energy value, and M represents totalframes, and N is represented FFT counts.

Determination sub-module 923, the energy value for being calculated according to calculating sub module 922 determines that the note per frame belongs to.

Calculating sub module 922, is additionally operable to calculate the energy value of each note；

Acquisition submodule 924, the energy value of each note for being calculated according to calculating sub module 922 obtains frame level audio Feature.

Wherein, calculating sub module 922, the average energy value specifically for calculating all frames contained by each note, as each The energy value of note；And the energy value of every frame included by each note is normalized to the energy value of affiliated note；

Acquisition submodule 924, the note of predetermined threshold is less than specifically for filtering out energy value, special to obtain frame level audio Levy.Wherein, above-mentioned predetermined threshold according to systematic function and/or can realize the sets itselfs, this reality such as demand when implementing Example is applied to be not construed as limiting the size of above-mentioned predetermined threshold.

In the present embodiment, determination sub-module 923 can include：Note determining unit 9231 and judging unit 9232；

Note determining unit 9231, in each Frequency point, determining the first frame and the second frame category of above-mentioned audio file In first note；

Judging unit 9232, for judging whether the absolute value of the first difference is less than the second difference；Above-mentioned first difference is The energy value of the 3rd frame and the first frame of above-mentioned audio file of above-mentioned audio file to the average value of the second frame energy value difference, Second difference is the difference of the maxima and minima of the first frame to the second frame energy value of above-mentioned audio file；

Note determining unit 9231, is additionally operable to, when the absolute value of the first difference is less than the second difference, determine above-mentioned audio 3rd frame of file belongs to first note, then judges the 4th frame backward successively until the note of last frame belongs to.

Note determining unit 9231, is additionally operable to when the absolute value of the first difference is more than or equal to the second difference, will be above-mentioned 3rd frame of audio file and determines that the 4th frame of above-mentioned audio file belongs to second sound as the beginning of second note Symbol；

Judging unit 9232, be additionally operable to judge since the 5th frame of above-mentioned audio file the 3rd difference absolute value whether Less than the 4th difference, above-mentioned 3rd difference is the energy value of the 5th frame of above-mentioned audio file and the 3rd frame of above-mentioned audio file To the difference of the average value of the 4th frame energy value, above-mentioned 4th difference is the 3rd frame to the 4th frame energy value of above-mentioned audio file The difference of maxima and minima；Determine that the note of the 5th frame belongs to according to judging that the note of the 3rd frame belongs to identical mode, By that analogy, until the note ownership determination of the last frame of above-mentioned audio file is finished.

That is, determination sub-module 923 determines that the note ownership of every frame can be：Each Frequency point is located as follows Reason：Note determining unit 9231 is by T₁And T₂Frame thinks to belong to first note, and judging unit 9232 is from T₃Frame starts judgement and returned If category --- meet | E (T₃, F₁)-E_mean(T₁, T₂)|<(E_max(T₁,T₂)-E_min(T₁,T₂)), then T3 frames belong to first Individual note, then judge the ownership per frame backward successively, wherein, E_mean(T₁, T₂)、E_max(T₁,T₂) and E_min(T₁,T₂) represent respectively T₁To T₂Average value, maximum and the minimum value of frame energy value；Otherwise by T₃Frame as second note beginning, and really Fixed T₄Frame belongs to second note, from T₅Frame starts to judge, is still by formula | E (T₅, F₁)-E_mean(T₃, T₄)|<(E_max (T₃,T₄)-E_min(T₃,T₄)) determine that the note of T5 frames belongs to, until the note ownership determination of all frames is finished.

Further, above-mentioned automatic composition device can also include：Determining module 96 and training module 97；

Determining module 96, for before the music that the acquisition of module 93 is predicted is obtained, determining opening up for music forecast model Flutter structure；In the present embodiment, the topological structure for the music forecast model that determining module 96 is determined is RNN models, as shown in figure 8, The input of RNN models is the output N (T of music frequency band feature binding model_m,F_k), and previous frame model output h_m, output For the energy value N (T of next frame_m+1,F_k)。

Training module 97, is determined for the output according to above-mentioned music frequency band feature binding model, and determining module 96 Topological structure, train above-mentioned music forecast model.

Above-mentioned automatic composition device can realize automatic composition, and then can improve the efficiency and feasibility wrirted music automatically, Influence of the subjective factor to wrirting music automatically is reduced, is a kind of brand-new automatic composing method, solves present in prior art Efficiency is low, poor feasibility and the problems such as big subjective impact.

Figure 11 is that the terminal device in the structural representation of the application terminal device one embodiment, the application can be realized The automatic composing method that the application is provided, above-mentioned terminal device can be client device, or server device, this Shen Please the form to above-mentioned terminal device is not construed as limiting.Above-mentioned terminal device can include：One or more processors；Storage dress Put, for storing one or more programs；When said one or multiple programs are by said one or multiple computing devices, make Obtain said one or multiple processors realize the automatic composing method that the application is provided.

Figure 11 shows the block diagram suitable for being used for the exemplary terminal equipment 12 for realizing the application embodiment.Figure 11 is shown Terminal device 12 be only an example, should not be to the function of the embodiment of the present application and any limitation using range band.

As shown in figure 11, terminal device 12 is showed in the form of universal computing device.The component of terminal device 12 can be wrapped Include but be not limited to：One or more processor or processing unit 16, system storage 28, connection different system component (bag Include system storage 28 and processing unit 16) bus 18.

Bus 18 represents the one or more in a few class bus structures, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.Lift For example, these architectures include but is not limited to industry standard architecture (Industry Standard Architecture；Hereinafter referred to as：ISA) bus, MCA (Micro Channel Architecture；Below Referred to as：MAC) bus, enhanced isa bus, VESA (Video Electronics Standards Association；Hereinafter referred to as：VESA) local bus and periphery component interconnection (Peripheral Component Interconnection；Hereinafter referred to as：PCI) bus.

Terminal device 12 typically comprises various computing systems computer-readable recording medium.These media can be it is any can be by end The usable medium that end equipment 12 is accessed, including volatibility and non-volatile media, moveable and immovable medium.

System storage 28 can include the computer system readable media of form of volatile memory, such as arbitrary access Memory (Random Access Memory；Hereinafter referred to as：RAM) 30 and/or cache memory 32.Terminal device 12 can To further comprise other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only as act Example, storage system 34 can be used for the immovable, non-volatile magnetic media of read-write, and (Figure 11 does not show that commonly referred to as " hard disk drives Dynamic device ").Although not shown in Figure 11, can provide for the magnetic to may move non-volatile magnetic disk (such as " floppy disk ") read-write Disk drive, and to removable anonvolatile optical disk (for example：Compact disc read-only memory (Compact Disc Read Only Memory；Hereinafter referred to as：CD-ROM), digital multi read-only optical disc (Digital Video Disc Read Only Memory；Hereinafter referred to as：DVD-ROM) or other optical mediums) read-write CD drive.In these cases, each driving Device can be connected by one or more data media interfaces with bus 18.Memory 28 can include the production of at least one program Product, the program product has one group of (for example, at least one) program module, and it is each that these program modules are configured to perform the application The function of embodiment.

Program/utility 40 with one group of (at least one) program module 42, can be stored in such as memory 28 In, such program module 42 includes --- but being not limited to --- operating system, one or more application program, other programs The realization of network environment is potentially included in each or certain combination in module and routine data, these examples.Program mould Block 42 generally performs the automatic composing method in embodiments described herein.

Terminal device 12 can also be with one or more external equipments 14 (such as keyboard, sensing equipment, display 24) Communication, can also enable a user to the equipment communication interacted with the terminal device 12 with one or more, and/or with causing the end Any equipment (such as network interface card, modem etc.) that end equipment 12 can be communicated with one or more of the other computing device Communication.This communication can be carried out by input/output (I/O) interface 22.Also, terminal device 12 can also be suitable by network Orchestration 20 and one or more network (such as LAN (Local Area Network；Hereinafter referred to as：LAN), wide area network (Wide Area Network；Hereinafter referred to as：WAN) and/or public network, such as internet) communication.As shown in figure 11, network Adapter 20 is communicated by bus 18 with other modules of terminal device 12.Although it should be understood that not shown in Figure 11, Ke Yijie Close terminal device 12 and use other hardware and/or software module, include but is not limited to：Microcode, device driver, redundancy processing Unit, external disk drive array, RAID system, tape drive and data backup storage system etc..

Processing unit 16 is stored in program in system storage 28 by operation, thus perform various function application and Data processing, for example, realize the automatic composing method that the application is provided.

It should be noted that in the description of the present application, term " first ", " second " etc. are only used for describing purpose, without It is understood that to indicate or imply relative importance.In addition, in the description of the present application, unless otherwise indicated, the implication of " multiple " It is two or more.

Any process described otherwise above or method description are construed as in flow chart or herein, represent to include Module, fragment or the portion of the code of one or more executable instructions for the step of realizing specific logical function or process Point, and the scope of the preferred embodiment of the application includes other realization, wherein can not be by shown or discussion suitable Sequence, including according to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be by the application Embodiment person of ordinary skill in the field understood.

It should be appreciated that each several part of the application can be realized with hardware, software, firmware or combinations thereof.Above-mentioned In embodiment, the software that multiple steps or method can in memory and by suitable instruction execution system be performed with storage Or firmware is realized.If, and in another embodiment, can be with well known in the art for example, realized with hardware Any one of row technology or their combination are realized：With the logic gates for realizing logic function to data-signal Discrete logic, the application specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (Programmable Gate Array；Hereinafter referred to as：PGA), field programmable gate array (Field Programmable Gate Array；Hereinafter referred to as：FPGA) etc..

The application also provides a kind of storage medium for including computer executable instructions, and above computer executable instruction exists For performing the automatic composing method that the application is provided when being performed by computer processor.

The above-mentioned storage medium comprising computer executable instructions can use one or more computer-readable media Any combination.Computer-readable medium can be computer-readable signal media or computer-readable recording medium.Calculate Machine readable storage medium storing program for executing for example may be-but not limited to-electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor system, Device or device, or any combination above.The more specifically example (non exhaustive list) of computer-readable recording medium Including：Electrical connection, portable computer diskette, hard disk, random access memory (RAM) with one or more wires, only Read memory (Read Only Memory；Hereinafter referred to as：ROM), erasable programmable read only memory (Erasable Programmable Read Only Memory；Hereinafter referred to as：) or flash memory, optical fiber, portable compact disc are read-only deposits EPROM Reservoir (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer Readable storage medium storing program for executing can be it is any include or storage program tangible medium, the program can be commanded execution system, device Or device is used or in connection.

Computer-readable signal media can be included in a base band or as the data-signal of carrier wave part propagation, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium beyond computer-readable recording medium, the computer-readable medium can send, propagate or Transmit for being used or program in connection by instruction execution system, device or device.

The program code included on computer-readable medium can be transmitted with any appropriate medium, including --- but do not limit In --- wireless, electric wire, optical cable, RF etc., or above-mentioned any appropriate combination.

The computer for performing the application operation can be write with one or more programming languages or its combination Program code, described program design language includes object oriented program language-such as Java, Smalltalk, C++, Also including conventional procedural programming language-such as " C " language or similar programming language.Program code can be with Fully perform, partly perform on the user computer on the user computer, as independent software kit execution, a portion Divide part execution or the execution completely on remote computer or server on the remote computer on the user computer. It is related in the situation of remote computer, remote computer can be by the network of any kind --- including LAN (Local Area Network；Hereinafter referred to as：) or wide area network (Wide Area Network LAN；Hereinafter referred to as：WAN) it is connected to user Computer, or, it may be connected to outer computer (such as using ISP come by Internet connection).

Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method is carried Rapid to can be by program to instruct the hardware of correlation to complete, described program can be stored in a kind of computer-readable storage medium In matter, the program upon execution, including one or a combination set of the step of embodiment of the method.

In addition, each functional module in each embodiment of the application can be integrated in a processing module or Modules are individually physically present, can also two or more modules be integrated in a module.Above-mentioned integrated module Both it can be realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.If the integrated module Using in the form of software function module realize and as independent production marketing or in use, a computer can also be stored in can Read in storage medium.

Storage medium mentioned above can be read-only storage, disk or CD etc..

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means to combine specific features, structure, material or the spy that the embodiment or example are described Point is contained at least one embodiment of the application or example.In this manual, to the schematic representation of above-mentioned term not Necessarily refer to identical embodiment or example.Moreover, specific features, structure, material or the feature of description can be any One or more embodiments or example in combine in an appropriate manner.

Although embodiments herein has been shown and described above, it is to be understood that above-described embodiment is example Property, it is impossible to the limitation to the application is interpreted as, one of ordinary skill in the art within the scope of application can be to above-mentioned Embodiment is changed, changed, replacing and modification.

Claims

1. a kind of automatic composing method, it is characterised in that including：

Receive the music file of leading portion music to be predicted, the music file of the leading portion music to be predicted include it is described it is to be predicted before Duan Yinle voice data or music description information；

Extract the frame level audio frequency characteristics of the music file correspondence music；

According to the frame level audio frequency characteristics and the music frequency band feature binding model built in advance, the frame for carrying band information is obtained Level audio frequency characteristics；

According to the frame level audio frequency characteristics of the carrying band information and the music forecast model built in advance, the sound predicted is obtained It is happy, to realize automatic composition.

2. according to the method described in claim 1, it is characterised in that described to build according to the frame level audio frequency characteristics and in advance Before music frequency band feature binding model, the frame level audio frequency characteristics for obtaining carrying band information, in addition to：

Music file is collected, and the music file is converted to the audio file of same format；

Extract the frame level audio frequency characteristics of the audio file；

Determine the topological structure of music frequency band feature binding model；

According to the topological structure of determination and the frame level audio frequency characteristics, the music frequency band feature binding model is trained.

3. method according to claim 2, it is characterised in that the frame level audio frequency characteristics bag of the extraction audio file Include：

The audio file is fixed to the Fast Fourier Transform (FFT) of points by frame；

Energy value of every frame in each Frequency point of the audio file is calculated according to the result of Fast Fourier Transform (FFT)；

Determine that the note per frame belongs to according to the energy value；

The energy value of each note is calculated, frame level audio frequency characteristics are obtained according to the energy value of each note.

4. method according to claim 3, it is characterised in that described to determine that the note per frame belongs to according to the energy value Including：

In each Frequency point, determine that the first frame and the second frame of the audio file belong to first note；

Judge whether the absolute value of the first difference is less than the second difference；First difference is the 3rd frame of the audio file First frame of energy value and the audio file is to the difference of the average value of the second frame energy value, and second difference is the audio Difference of first frame of file to the maxima and minima of the second frame energy value；

If it is, determine that the 3rd frame of the audio file belongs to first note, then judge the 4th frame backward successively until The note ownership of last frame.

5. method according to claim 4, it is characterised in that whether the absolute value for judging the first difference is less than second After the absolute value of difference, in addition to：

If the absolute value of first difference is more than or equal to second difference, the 3rd frame of the audio file is made For the beginning of second note, and determine that the 4th frame of the audio file belongs to second note；

Judge whether the absolute value of the 3rd difference is less than the 4th difference since the 5th frame of the audio file, the described 3rd is poor It is worth the average value of the energy value of the 5th frame for the audio file and the 3rd frame to the 4th frame energy value of the audio file Difference, the 4th difference for maxima and minima of the 3rd frame to the 4th frame energy value of the audio file difference；Directly Finished to by the note ownership determination of the last frame of the audio file.

6. method according to claim 3, it is characterised in that the energy value of each note of calculating, according to each sound The energy value of symbol, which obtains frame level audio frequency characteristics, to be included：

The average energy value of all frames contained by each note is calculated, the energy value of each note is used as；

The energy value of every frame included by each note is normalized to the energy value of affiliated note；

The note that energy value is less than predetermined threshold is filtered out, to obtain frame level audio frequency characteristics.

7. according to the method described in claim 1, it is characterised in that the frame level audio according to the carrying band information is special Seek peace the music forecast model built in advance, obtain before the music predicted, can also include：

Determine the topological structure of music forecast model；

According to the output of the music frequency band feature binding model, and the topological structure determined, train the music prediction mould Type.

8. a kind of automatic composition device, it is characterised in that including：

Receiving module, the music file for receiving leading portion music to be predicted, the music file bag of the leading portion music to be predicted Include the voice data or music description information of the leading portion music to be predicted；

Extraction module, the frame level audio frequency characteristics for extracting the music file correspondence music that the receiving module is received；

Module is obtained, for according to the frame level audio frequency characteristics and the music frequency band feature binding model built in advance, being taken Frame level audio frequency characteristics with band information；And according to the frame level audio frequency characteristics for carrying band information and the sound built in advance Happy forecast model, obtains the music predicted, to realize automatic composition.

9. device according to claim 8, it is characterised in that also include：Collection module, modular converter, determining module and Training module；

The collection module, for before the acquisition module obtains the frame level audio frequency characteristics for carrying band information, collecting sound Music file；

The modular converter, the music file for the collection module to be collected is converted to the audio file of same format；

The extraction module, is additionally operable to extract the frame level audio frequency characteristics of the audio file of the modular converter conversion；

The determining module, the topological structure for determining music frequency band feature binding model；

The training module, the frame level sound extracted for the topological structure determined according to the determining module and the extraction module Frequency feature, trains the music frequency band feature binding model.

10. device according to claim 9, it is characterised in that the extraction module includes：

Transformation submodule, the Fast Fourier Transform (FFT) for the audio file to be fixed to points by frame；

Calculating sub module, for calculating the every of the audio file according to the result of the transformation submodule Fast Fourier Transform (FFT) Energy value of the frame in each Frequency point；

Determination sub-module, the energy value for being calculated according to the calculating sub module determines that the note per frame belongs to；

The calculating sub module, is additionally operable to calculate the energy value of each note；

Acquisition submodule, the energy value of each note for being calculated according to the calculating sub module obtains frame level audio frequency characteristics.

11. device according to claim 10, it is characterised in that the determination sub-module includes：

Note determining unit, in each Frequency point, determining that the first frame and the second frame of the audio file belong to first Note；

Judging unit, for judging whether the absolute value of the first difference is less than the second difference；First difference is the audio The energy value of the 3rd frame and the first frame of the audio file of file to the average value of the second frame energy value difference, described second Difference is the difference of the maxima and minima of the first frame to the second frame energy value of the audio file；

The note determining unit, is additionally operable to, when the absolute value of first difference is less than the second difference, determine the audio 3rd frame of file belongs to first note, then judges the 4th frame backward successively until the note of last frame belongs to.

12. device according to claim 11, it is characterised in that

The note determining unit, is additionally operable to when the absolute value of first difference is more than or equal to second difference, will 3rd frame of the audio file and determines that the 4th frame of the audio file belongs to described as the beginning of second note Two notes；

The judging unit, is additionally operable to judge whether the absolute value of the 3rd difference is less than since the 5th frame of the audio file 4th difference, the 3rd difference for the audio file the energy value of the 5th frame and the 3rd frame of the audio file to the The difference of the average value of four frame energy values, the 4th difference is the maximum of the 3rd frame to the 4th frame energy value of the audio file The difference of value and minimum value；Until the note ownership determination of the last frame of the audio file is finished.

13. device according to claim 10, it is characterised in that

The calculating sub module, the average energy value specifically for calculating all frames contained by each note, is used as the energy of each note Value；And the energy value of every frame included by each note is normalized to the energy value of affiliated note；

The acquisition submodule, the note of predetermined threshold is less than specifically for filtering out energy value, to obtain frame level audio frequency characteristics.

14. device according to claim 8, it is characterised in that also include：Determining module and training module；

The determining module, for before the acquisition module obtains the music predicted, determining opening up for music forecast model Flutter structure；

The training module, is determined for the output according to the music frequency band feature binding model, and the determining module Topological structure, train the music forecast model.

15. a kind of terminal device, it is characterised in that including：

One or more processors；

Storage device, for storing one or more programs；

When one or more of programs are by one or more of computing devices so that one or more of processors Realize the method as described in any in claim 1-7.

16. a kind of storage medium for including computer executable instructions, the computer executable instructions are by computer disposal For performing the method as described in any in claim 1-7 when device is performed.