CN101556795B

CN101556795B - Method and device for computing voice fundamental frequency

Info

Publication number: CN101556795B
Application number: CN2008100432334A
Authority: CN
Inventors: 黄鹤云; 林福辉
Original assignee: Spreadtrum Communications Shanghai Co Ltd
Current assignee: Spreadtrum Communications Shanghai Co Ltd
Priority date: 2008-04-09
Filing date: 2008-04-09
Publication date: 2012-07-18
Anticipated expiration: 2028-04-09
Also published as: CN101556795A

Abstract

The invention belongs to the field of signal processing and discloses a method and a device for computing voice fundamental frequency. The method and the device can more accurately estimate the fundamental frequency. In the invention, before the fundamental frequency computation, frequency domain signals in use are reconstructed to generate a reconstruction function with a continuous definitional domain. The function undergoes curve fitting in a definitional domain near each frequency domain peak value according to the corresponding frequency domain signals and effectively inhibits the corresponding frequency domain signals in other definitional domains. When the fundamental frequency is searched, candidate fundamental frequencies and a plurality of frequency multiplications thereof are comprehensively taken into consideration.

Description

The method and apparatus of computing voice fundamental frequency

Technical field

The present invention relates to field of voice signal, particularly the technology of computing voice fundamental frequency.

Background technology

Fast development along with network and multimedia technology; Speech processing system has spread in each field such as broadcasting, TV, communication, from broadcasting, TV programme making apparatus all be unable to do without speech processing system to hand-held phone, portable audio/video playback apparatus.

Handle and the encoding and decoding speech field at voice signal, estimate that correctly fundamental frequency is extremely important.

From the angle of voice generation principle, voice derive from the vibration of the operatic tunes, produce sound wave, and the modulation through the sound channel organ obtains voice signal again.The type of the voice signal that the vibration of the operatic tunes can determine to produce usually, vowel for example, consonant, fricative or the like.In the voice that reality occurs, vowel is seized of significant proportion.An English word is comprising the vowel of major part usually.In view of the angle of signal analysis, vowel is mainly by humorous wave component, and promptly its frequency component is made up of a fundamental frequency (also can abbreviate fundamental frequency as) and several its integer multiple frequencies.

Be in 4,161,625 the United States Patent (USP), to disclose a kind of method that from voice signal, obtains fundamental frequency in the patent No..In this patent, through initial voice signal is handled, obtain difference signal, adopt auto-correlation algorithm to obtain fundamental frequency again based on difference signal.

Because in the algorithm of actual speech encoding and decoding and voice signal processing (for example G.729 the encoding and decoding speech standard waits), traditional calculating fundamental frequency algorithm mainly is an auto-correlation algorithm, promptly the maximum coefficient of autocorrelation through the computing voice signal finds specific value.Because comprise a large amount of noises in the voice signal usually, be that the computing method of fundamental frequency on basis possibly exist certain deviation so use with the auto-correlation algorithm.

Summary of the invention

The object of the present invention is to provide a kind of method and apparatus of computing voice fundamental frequency, can estimate fundamental frequency more exactly.

The invention discloses a kind of method of computing voice fundamental frequency, may further comprise the steps:

The voice signal of time domain is transformed to discrete frequency domain signal X _i, i=1 wherein, 2 ..., N;

| X _i| in find out each peak value M as local maximum _j, j=1 wherein, 2 ..., L, L are the number of peak value, || expression takes absolute value;

In the related field of definition of said discrete frequency-region signal, L nonoverlapping regional Z of structure _j, each Z _jSize be scheduled to each Z _jCover a D _j, D wherein _jBe M _jIn the pairing value of field of definition;

With each Z _jFor field of definition is constructed continuous function S respectively _j(ω), ω ∈ Z _j, satisfy | S _j(ω _i)-| X _i||＜C1, wherein ω _iBe X _iIn the pairing value of field of definition, C1 is a positive constant;

At each Z _jIn the field of definition that does not have to cover, constructed fuction S ₀(ω),

ω &Element; [\begin{matrix} 0 & \frac{F_{S}}{2} \end{matrix}]

And

ω &NotElement; Z_{j},

F wherein _sBe sampling rate, satisfy S ₀(ω _i)＜| X _i|;

Will be by each S _j(ω) and S ₀The S that (ω) is combined into (ω) calculates fundamental frequency as frequency spectrum.

The invention also discloses a kind of equipment of computing voice fundamental frequency, comprising:

Converter unit is used for the voice signal of time domain is transformed to discrete frequency domain signal X _i, i=1 wherein, 2 ..., N;

The peak value computing unit is used for | X _i| in find out each peak value M as local maximum _j, j=1 wherein, 2 ..., L, L are the number of peak value, || expression takes absolute value;

Reconfiguration unit is used in the related field of definition of said discrete frequency-region signal, L nonoverlapping regional Z of structure _j, each Z _jSize be scheduled to each Z _jCover a D _j, D wherein _jBe M _jIn the pairing value of field of definition; With each Z _jFor field of definition is constructed continuous function S respectively _j(ω), ω ∈ Z _j, satisfy | S _j(ω _i)-| X _i||＜C1, wherein ω _iBe X _iIn the pairing value of field of definition, C1 is a positive constant; At each Z _jIn the field of definition that does not have to cover, constructed fuction S ₀(ω),

ω &Element; [\begin{matrix} 0 & \frac{F_{S}}{2} \end{matrix}]

And

{ω &NotElement; Z}_{j},

F wherein _sBe sampling rate, satisfy S ₀(ω _i)＜| X _i|;

The fundamental tone computing unit is used for by each S _j(ω) and S ₀The S that (ω) is combined into (ω) calculates fundamental frequency as frequency spectrum.

Embodiment of the present invention compared with prior art, the key distinction and effect thereof are:

Before calculating fundamental frequency, earlier used frequency-region signal is carried out reconstruct; Generate the reconstruction of function that field of definition is continuous; Carry out curve fitting by corresponding frequency-region signal near the field of definition of this function each frequency domain peak value, in other field of definition, corresponding frequency-region signal is effectively suppressed.Because candidate's fundamental frequency and multiple frequency thereof show as peak value usually, so, can improve accuracy and antijamming capability that fundamental frequency calculates through keeping near the frequency-region signal in the field of definition each peak value, significantly weakening frequency-region signal in other field of definition.The frequency-region signal that obtains through conversion disperses, through can more accurately advancing in the frequency spectrum of reconstruction of function representative to search for fundamental frequency to the serialization of field of definition.

Further, when pitch search, take all factors into consideration candidate's fundamental frequency and a plurality of frequency multiplication thereof, can make Search Results more accurate.

Further, can be through the reconstruction of function value in other field of definition outside near the field of definition peak value be made as 0, thus weaken irrelevant frequency component to greatest extent, further improve accuracy and antijamming capability that fundamental frequency calculates.

Description of drawings

Fig. 1 is the method flow diagram according to a kind of computing voice fundamental frequency of first embodiment of the invention;

Fig. 2 is the equipment structure chart according to a kind of computing voice fundamental frequency of third embodiment of the invention.

Embodiment

For making the object of the invention, technical scheme and advantage clearer, embodiment of the present invention is done to describe in detail further below in conjunction with accompanying drawing.

First embodiment of the present invention relates to a kind of method of computing voice fundamental frequency, and is as shown in Figure 1.

In step 110, the voice signal of importing is transformed into frequency domain from time domain.Specifically, the time domain voice signal of supposing input is x _i, i=1,2 ..., N then can be converted into discrete frequency domain signal X through Fast Fourier Transform (FFT) (FastFourier Transform is called for short " FFT ") _i, i=1 wherein, 2 ..., N.

Need to prove that in this step, the conversion from the time-domain signal to the frequency-region signal is embodied as example with FFT and describes, but in practical application, also can realize through other modes.Such as, can be through modes such as discrete cosine transform (Discrete Cosine Transform, be called for short " DCT ") or modified discrete cosine transforms, the voice signal of time domain is transformed to discrete frequency-region signal.

Then, in step 120, from each frequency domain signal X _iAbsolute value in, find out each peak value M as local maximum _j, j=1 wherein, 2 ..., L, L are the number of peak value.Such as, earlier according to X ₁, X ₂...., X _N, obtain the absolute value Y of each frequency-region signal ₁, Y ₂...., Y _N, Y wherein _i=| X _i|, i=1,2 ..., N.Then, search for local maximum Y again _i, as search for all and satisfy Y _i＞max (Y _I+1, Y _I-1) Y _iAs peak value M _j, the choosing method of this local maximum is actually among 3 o'clock and chooses maximal value, certainly, in practical application, also can in more point, (among as 5 or 6) choose maximal value.

Then, in step 130, reconstruct the continuous frequency spectrum of voice signal according to the frequency domain peak value of electing.Because each frequency-region signal that after the FFT conversion, obtains disperses, and the field of definition continuous functions of frequency spectrum promptly can't be provided, this calculating to fundamental tone has caused difficulty, therefore needs reconstruct and serialization frequency spectrum.Specific as follows:

At first, entire spectrum is divided into two types.One type frequency spectrum is corresponding with the frequency component of fundamental frequency or its certain multiple, and the frequency spectrum of another kind of type then is and the irrelevant pairing frequency spectrum of frequency component of fundamental frequency.Because the frequency component of fundamental frequency and its certain multiple all shows as a local maximum usually on frequency spectrum, can think that therefore in step 120 selected peak value represented the frequency component of fundamental frequency and its certain multiple.Then be considered to the frequency component that has nothing to do in other parts of whole frequency axis.

Secondly, respectively this frequency spectrum of two types is carried out function reconstruct.Specifically, in the related field of definition of discrete frequency-region signal, L of structure respectively with L the corresponding nonoverlapping regional Z of peak value _j, each Z _jSize be scheduled to each Z _jCover a D _j, D wherein _jBe M _jIn the pairing value of field of definition.With each Z _jFor field of definition is constructed continuous function S respectively _j(ω), ω ∈ Z _j, satisfy | S _j(ω _i)-| X _i||＜C1, wherein ω _iBe X _iIn the pairing value of field of definition, C1 is a positive constant.At each Z _jIn the field of definition that does not have to cover, constructed fuction S ₀(ω),

ω &Element; [\begin{matrix} 0 & \frac{F_{S}}{2} \end{matrix}]

And

ω &NotElement; Z_{j},

F wherein _sBe sampling rate, satisfy S ₀(ω _i)＜| X _i|.Down in the face of continuous function S _jMake (ω) further specifies.

In this embodiment, through with the corresponding frequency-region signal absolute value of peak value with and former and later two frequency-region signal absolute values carry out the binomial interpolation, realize continuous function S _jStructure (ω).Such as, first peak value M ₁Corresponding frequency-region signal absolute value Y _iIn the pairing value of field of definition is ω _i, its former and later two frequency-region signal absolute value (Y then _I-1, Y _I+1) be (ω in the pairing value of field of definition _I-1, ω _I+1).Suppose that this interpolation polynomial represented by following second degree trinomial expression:

f(x)＝ax ²+bx+c

Then through Substitution method can solve corresponding coefficient a, b, c}:

[a, b, c] = [Y_{i - 1}, Y_{i}, Y_{i + 1}] {(\begin{matrix} ω_{i - 1}^{2} & ω_{i}^{2} & ω_{i + 1}^{2} \\ ω_{i - 1} & ω_{i} & ω_{i + 1} \\ 1 & 1 & 1 \end{matrix})}^{- 1}

Therefore, can obtain: S ₁(ω)=a ₁ω ²+ b ₁ω+c ₁

In like manner, can construct each S through the binomial interpolation method _j(ω) function, i.e. S _j(ω)=a _jω ²+ b _jω+c _j

Need to prove, because in this embodiment, peak value M _jBe actually the maximal value of among 3 o'clock, choosing, if so first peak value M ₁Corresponding frequency-region signal absolute value is Y _i, then regional Z ₁Reference position be Y _I-1In the pairing value of field of definition, end position is Y _I+1In the pairing value of field of definition, promptly

Z_{1} = [\frac{F_{s}}{N} ω_{i - 1}, \frac{F_{s}}{N} ω_{i + 1}],

In like manner, can obtain each Z _jThe zone.The field of definition of the matched curve that each peak value is corresponding also can be taked other arbitrarily rational length

For each Z _jThe field of definition that does not have covering because Pitch Information is not contained in these zones, therefore can be used arbitrary function S with these parts simply ₀(ω) replace,

ω &Element; [\begin{matrix} 0 & \frac{F_{S}}{2} \end{matrix}]

And

ω &NotElement; Z_{j},

Function S ₀(ω) only need satisfy S ₀(ω _i)＜| X _i| this condition gets final product.Such as, adopt null function, that is: S ₀(ω)=0.

Because in this step, respectively this frequency spectrum of two types has been carried out function reconstruct, so entire spectrum all to be reconstructed into be a field of definition continuous functions, that is:

S (ω) = \{\begin{matrix} S_{1} (ω) = a_{1} ω^{2} + b_{1} ω + c_{1}, ω &Element; Z_{1} \\ S_{2} (ω) = a_{2} ω^{2} + b_{2} ω + c_{2}, ω &Element; Z_{2} \\ \cdot \\ \cdot \\ \cdot \\ S_{L} (ω) = a_{L} ω^{2} + b_{L} ω + c_{L}, ω &Element; Z_{L} \\ S_{0} (ω) = 0, ω &Element; [\begin{matrix} 0 & \frac{F_{s}}{2} \end{matrix}] andω &NotElement; Z_{j} \end{matrix}

Then, get into step 140, calculate fundamental frequency.Specifically, because in step 130, drawn a field of definition continuous functions S (ω), can directly derive fundamental frequency according to the function characteristics of this function S (ω).Such as, to search in the scope (as from 50 hertz to 500 hertz) that possibly exist of fundamental tone, the criterion of search is to find the frequency that satisfies following formula:

ω_{p} = \underset{ω}{\arg \max} Σ_{k = 1}^{N (ω)} {| S (kω) |}^{2}

Wherein, N (ω) is to be the harmonic wave number of fundamental frequency with ω, ω _pBe fundamental frequency.Need to prove that above-mentioned formula is an object lesson as search criteria, in practical application, also can adopt other formula, as with square changing to 4 powers or 1 power etc. in the above-mentioned formula.Above-mentioned ω _pThe essence of correlation formula is when pitch search, to take all factors into consideration candidate's fundamental frequency and a plurality of frequency multiplication thereof, and concrete formula form can have other variation, can make Search Results more accurate like this

Because in this embodiment; Before calculating fundamental frequency, earlier used frequency-region signal is carried out reconstruct; Generate the reconstruction of function that field of definition is continuous; Carry out curve fitting by corresponding frequency-region signal near the field of definition of this function each frequency domain peak value, in other field of definition, corresponding frequency-region signal is effectively suppressed.Because candidate's fundamental frequency and multiple frequency thereof show as peak value usually, so, can improve accuracy and antijamming capability that fundamental frequency calculates through keeping near the frequency-region signal in the field of definition each peak value, significantly weakening frequency-region signal in other field of definition.The frequency-region signal that obtains through conversion disperses, through can more accurately advancing in the frequency spectrum of reconstruction of function representative to search for fundamental frequency to the serialization of field of definition.

What deserves to be mentioned is, in this embodiment, each Z _jThe function S that the field of definition that does not have to cover is constructed ₀(ω) be: S ₀(ω)=0, thereby weaken irrelevant frequency component to greatest extent, further improve accuracy and antijamming capability that fundamental frequency calculates.And in practical application, also can be with function S ₀(ω) be changed to a very little value, can search fundamental frequency comparatively exactly equally.

Second embodiment of the present invention relates to a kind of method of computing voice fundamental frequency, and this embodiment is roughly the same with first embodiment, and its difference is, in the first embodiment, is constructing continuous function S _jIn the time of (ω), be through with the corresponding frequency-region signal absolute value of peak value with and former and later two frequency-region signal absolute values carry out the binomial interpolation and realize; And in this embodiment, can be through fitting to the segmentation straight line, or come match with cubic polynomial, realize continuous function S _jStructure (ω).

Method embodiment of the present invention can be realized with software, hardware, firmware or the like mode.No matter the present invention be with software, hardware, or the firmware mode realize; Instruction code can be stored in the storer of computer-accessible of any kind (for example permanent or revisable; Volatibility or non-volatile; Solid-state or non-solid-state, medium fixing or that change or the like).Equally; Storer can for example be programmable logic array (Programmable Array Logic; Abbreviation " PAL "), RAS (Random Access Memory; Abbreviation " RAM "), programmable read only memory (Programmable Read Only Memory is called for short " PROM "), ROM (read-only memory) (Read-Only Memory is called for short " ROM "), Electrically Erasable Read Only Memory (Electrically Erasable Programmable ROM; Abbreviation " EEPROM "), disk, CD, digital versatile disc (Digital Versatile Disc is called for short " DVD ") or the like.

The 3rd embodiment of the present invention relates to a kind of equipment of computing voice fundamental frequency, and is as shown in Figure 2, comprising: converter unit is used for the voice signal of time domain is transformed to discrete frequency domain signal X _i, i=1 wherein, 2 ..., N; The peak value computing unit is used for | X _i| in find out each peak value M as local maximum _j, j=1 wherein, 2 ..., L, L are the number of peak value, || expression takes absolute value; Reconfiguration unit is used in the related field of definition of discrete frequency-region signal, L nonoverlapping regional Z of structure _j, each Z _jSize be scheduled to each Z _jCover a D _j, D wherein _jBe M _jIn the pairing value of field of definition; With each Z _jFor field of definition is constructed continuous function S respectively _j(ω), ω ∈ Z _j, satisfy | S _j(ω _i)-| X _i||＜C1, wherein ω _iBe X _iIn the pairing value of field of definition, C1 is a positive constant; At each Z _jIn the field of definition that does not have to cover, constructed fuction S ₀(ω),

ω &Element; [\begin{matrix} 0 & \frac{F_{S}}{2} \end{matrix}]

And

ω &NotElement; Z_{j},

F wherein _sBe sampling rate, satisfy s ₀(ω _i)＜| X _i|; The fundamental tone computing unit is used for by each S _j(ω) and S ₀The S that (ω) is combined into (ω) calculates fundamental frequency as frequency spectrum.

The fundamental tone computing unit calculates fundamental frequency in the following manner: fundamental tone possibly have scope search, the criterion of search is to find the frequency that satisfies following formula:

ω_{p} = \underset{ω}{\arg \max} Σ_{k = 1}^{N (ω)} {| S (kω) |}^{2}

Wherein, N (ω) is to be the harmonic wave number of fundamental frequency with ω, ω _pBe fundamental frequency.

Converter unit can adopt modes such as FFT, discrete cosine transform, modified discrete cosine transform, and the voice signal of time domain is transformed to discrete frequency-region signal.

Reconfiguration unit can adopt one of following mode to realize S _jStructure (ω): with the corresponding frequency-region signal absolute value of peak value with and former and later two frequency-region signal absolute values carry out the binomial interpolation or fit to the segmentation straight line, or come match with cubic polynomial.

Need to prove; Each unit of mentioning in this embodiment all is a logical block, and physically, a logical block can be a physical location; It also can be the part of a physical location; Can also realize that the physics realization mode of these logical blocks itself is not most important with the combination of a plurality of physical locations, the combination of the function that these logical blocks realized is the key that just solves technical matters proposed by the invention.

In addition, for outstanding innovation part of the present invention, this embodiment will not introduced with solving the not too close unit of technical matters relation proposed by the invention, and this does not show that there is not other unit in this equipment embodiment.

Though through with reference to some preferred embodiment of the present invention, the present invention is illustrated and describes, those of ordinary skill in the art should be understood that and can do various changes to it in form with on the details, and without departing from the spirit and scope of the present invention.

Claims

1. the method for a computing voice fundamental frequency is characterized in that, may further comprise the steps:

And

F wherein _sBe sampling rate, satisfy S ₀(ω _i)＜| X _i|;

2. the method for computing voice fundamental frequency according to claim 1 is characterized in that, and is said with the step of S (ω) as frequency spectrum calculating fundamental frequency, realizes through following substep:

Fundamental tone possibly have scope search, the criterion of search is to find the frequency that satisfies following formula:

ω_{p} = \underset{ω}{\arg \max} Σ_{k = 1}^{N (ω)} {| S (kω) |}^{2}

Wherein, N (ω) is to be the harmonic wave number of fundamental frequency with ω, ω _pBe the result of calculation of fundamental frequency.

3. the method for computing voice fundamental frequency according to claim 2 is characterized in that, is transformed in the step of discrete frequency-region signal at the voice signal with time domain, adopts one of following mapping mode:

FFT, discrete cosine transform, modified discrete cosine transform.

4. the method for computing voice fundamental frequency according to claim 3 is characterized in that,

Said at | X _i| in find out each peak value M as local maximum _jStep comprise following substep:

Calculate Y _i=| X _i|;

Search for all and satisfy Y _I＞max (Y _I+1, Y _I-1) Y _iAs peak value M _j

5. the method for computing voice fundamental frequency according to claim 4 is characterized in that,

At said structure continuous function S _jIn the step (ω), adopt one of following mode to realize S _jStructure (ω):

With the corresponding frequency-region signal absolute value of peak value with and former and later two frequency-region signal absolute values carry out the binomial interpolation; Or fit to the segmentation straight line; Or come match with cubic polynomial.

6. the method for computing voice fundamental frequency according to claim 5 is characterized in that,

Said S ₀(ω)=0.

7. the equipment of a computing voice fundamental frequency is characterized in that, comprising:

Reconfiguration unit is used in the related field of definition of said discrete frequency-region signal, L nonoverlapping regional Z of structure _j, each Z _jSize be scheduled to each Z _jCover a D _j, D wherein _jBe M _jIn the pairing value of field of definition; With each Z _jFor field of definition is constructed continuous function S respectively _j(ω), ω ∈ Z _j, satisfy | S _j(ω _i)-| X _i||＜C ₁, ω wherein _iBe X _iIn the pairing value of field of definition, C1 is a positive constant; At each Z _jIn the field of definition that does not have to cover, constructed fuction S ₀(ω),

And

F wherein _sBe sampling rate, satisfy S ₀(ω _i)＜| X _i|;

8. the equipment of computing voice fundamental frequency according to claim 7 is characterized in that,

Said fundamental tone computing unit calculates fundamental frequency in the following manner:

ω_{p} = \underset{ω}{\arg \max} Σ_{k = 1}^{N (ω)} {| S (kω) |}^{2}

9. the equipment of computing voice fundamental frequency according to claim 8 is characterized in that,

One of the following mapping mode that adopts said converter unit realizes the voice signal of time domain is transformed to discrete frequency-region signal:

FFT, discrete cosine transform, modified discrete cosine transform.

10. the equipment of computing voice fundamental frequency according to claim 9 is characterized in that,

Said reconfiguration unit adopts one of following mode to realize S _jStructure (ω):