EP0782128A1

EP0782128A1 - Method of analysing by linear prediction an audio frequency signal, and its application to a method of coding and decoding an audio frequency signal

Info

Publication number: EP0782128A1
Application number: EP96402715A
Authority: EP
Inventors: Catherine Quinquis; Alain Le Guyader
Original assignee: France Telecom SA
Current assignee: Orange SA
Priority date: 1995-12-15
Filing date: 1996-12-12
Publication date: 1997-07-02
Anticipated expiration: 2016-12-12
Also published as: KR100421226B1; JP3678519B2; DE69608947T2; KR970050107A; FR2742568A1; JPH09212199A; DE69608947D1; EP0782128B1; US5787390A; CN1159691A; FR2742568B1

Abstract

The method involves determining the short term spectral parameters for an audio frequency signal (S0(n)) using q successive prediction stages p where q is larger than or equal to p. Each stage performs analysis of the coefficients of order Mp=a1<p>...aMp<p> to analyse the input signal S0(n) and the signal sp(n) of a stage (p+1) using a filter transfer function: Ap(z)=1+Sumä(from i=1 to mp))ai<p>*z<(-1)>ü The number of linear predictive coefficients is increased for the following stage.

Description

La présente invention concerne un procédé d'analyse par prédiction linéaire d'un signal audiofréquence. Ce procédé trouve une application particulière, mais non exclusive, dans des codeurs audio à prédiction, notamment dans des codeurs à analyse par synthèse, dont le type le plus répandu est le codeur CELP ("Code-Excited Linear Prediction").The present invention relates to a method for linear prediction analysis of an audio frequency signal. This process finds a particular, but not exclusive, application in predictive audio coders, in particular in coders with analysis by synthesis, of which the most widespread type is the coder CELP ("Code-Excited Linear Prediction").

Les techniques de codage prédictif à analyse par synthèse sont actuellement très répandues pour le codage de la parole en bande téléphonique (300-3400 Hz) à des débits pouvant descendre jusqu'à 8 kbit/s, tout en conservant une qualité téléphonique. Pour la bande audio (de l'ordre de 20 kHz), les techniques de codage par transformée sont utilisées pour des applications de diffusion et de stockage de signaux vocaux et musicaux. Cependant, ces techniques impliquent des retards de codage relativement importants (plus grands que 100 ms), ce qui produit en particulier des difficultés de participation dans les communications de groupe où l'interactivité est très importante. Les techniques prédictives produisent un retard plus faible, dépendant essentiellement de la longueur des trames d'analyse par prédiction linéaire (typiquement 10 à 20 ms), et trouvent pour cette raison des applications même pour le codage de signaux vocaux et/ou musicaux ayant une largeur de bande supérieure à la bande téléphonique.The techniques of predictive coding with analysis by synthesis are currently very widespread for the coding of speech in the telephone band (300-3400 Hz) at bit rates which can go down to 8 kbit / s, while maintaining telephone quality. For the audio band (of the order of 20 kHz), transform coding techniques are used for applications of broadcasting and storage of vocal and musical signals. However, these techniques involve relatively large coding delays (greater than 100 ms), which in particular produces difficulties in participating in group communications where interactivity is very significant. Predictive techniques produce a lower delay, essentially depending on the length of the analysis frames by linear prediction (typically 10 to 20 ms), and for this reason find applications even for the coding of voice and / or musical signals having a bandwidth greater than the telephone band.

Les codeurs prédictifs utilisés pour la compression de débit réalisent une modélisation de l'enveloppe spectrale du signal. cette modélisation résulte d'une analyse par prédiction linéaire d'ordre M (M≃10 typiquement en bande étroite), consistant à déterminer M coefficients a_i de prédiction linéaire du signal d'entrée. Ces coefficients caractérisent un filtre de synthèse utilisé au décodeur, dont la fonction de transfert est de la forme 1/A(z) avec

The predictive coders used for bit rate compression perform a modeling of the spectral envelope of the signal. this modeling results from an analysis by linear prediction of order M (M≃10 typically in narrow band), consisting in determining M coefficients a _i of linear prediction of the input signal. These coefficients characterize a synthesis filter used at the decoder, whose transfer function is of the form 1 / A (z) with

L'analyse par prédiction linéaire a un domaine d'application général plus large que celui du codage de la parole. Dans certaines applications, l'ordre M de la prédiction constitue l'une des variables que l'analyse par prédiction linéaire vise à obtenir, cette variable étant influencée par le nombre de pics présents dans le spectre du signal analysé (voir US-A-5 142 581).Linear prediction analysis has a broader field of application than that of speech coding. In certain applications, the order M of the prediction constitutes one of the variables that the analysis by linear prediction aims to obtain, this variable being influenced by the number of peaks present in the spectrum of the analyzed signal (see US-A- 5,142,581).

Le filtre calculé par l'analyse par prédiction linéaire peut avoir diverses structures, conduisant à différents choix de paramètres pour la représentation des coefficients (les coefficients a_i eux-mêmes, les paramètres LAR, LSF, LSP, les coefficients de réflexion ou PARCOR...). Avant l'avènement des processeurs de signal numérique (DSP), il était courant d'employer des structures récursives pour le filtre calculé, par exemple des structures faisant appel aux coefficients PARCOR du type décrit dans l'article de F. ITAKURA et S. SAITO "Digital Filtering Techniques for Speech Analysis and Synthesis", Proc. of the 7th International Congress on Acoustics, Budapest 1971, pages 261-264 (voir FR-A-2 284 946 ou US-A-3 975 587).The filter calculated by the linear prediction analysis can have various structures, leading to different choices of parameters for the representation of the coefficients (the coefficients a _i themselves, the parameters LAR, LSF, LSP, the reflection coefficients or PARCOR. ..). Before the advent of digital signal processors (DSP), it was common to use recursive structures for the calculated filter, for example structures using PARCOR coefficients of the type described in the article by F. ITAKURA and S. SAITO "Digital Filtering Techniques for Speech Analysis and Synthesis", Proc. of the 7th International Congress on Acoustics, Budapest 1971, pages 261-264 (see FR-A-2 284 946 or US-A-3 975 587).

Dans les codeurs à analyse par synthèse, les coefficients a_i servent également à construire un filtre de pondération perceptuelle utilisé par le codeur pour déterminer le signal d'excitation à appliquer au filtre de synthèse à court terme pour obtenir un signal synthétique représentatif du signal de parole. Cette pondération perceptuelle accentue les portions du spectre où les erreurs de codage sont les plus perceptibles, c'est-à-dire les zones interformantiques. La fonction de transfert W(z) du filtre de pondération perceptuelle est habituellement de la forme $W (z) = \frac{A (z {/γ}_{1})}{A (z {/γ}_{2})}$

où γ₁ et γ₂ sont deux coefficients d'expansion spectrale tels que 0≤γ₂≤γ₁≤1. Une amélioration du masquage du bruit a été apportée par E. Ordentlich et Y. Shoham, dans leur article "Low-Delay Code-Excited Linear Predictive Coding of Wideband Speech at 32 kbps", Proc. ICASSP, Toronto, Mai 1991, pages 9-12. Cette amélioration consiste à combiner pour la pondération perceptuelle le filtre W(z) avec un autre filtre modélisant la pente du spectre. Cette amélioration est particulièrement appréciable dans le cas de codage de signaux à forte dynamique spectrale (bande élargie ou bande audio) pour lesquels les auteurs ont montré une importante amélioration de la qualité subjective du signal reconstruit.In synthesis analysis coders, the coefficients a _i are also used to construct a perceptual weighting filter used by the coder to determine the excitation signal to be applied to the short-term synthesis filter to obtain a synthetic signal representative of the speech signal. This perceptual weighting accentuates the portions of the spectrum where the coding errors are the most perceptible, that is to say the interformant areas. The transfer function W (z) of the perceptual weighting filter is usually of the form

W ((z) = \frac{AT (z {/γ}_{1})}{AT (z {/γ}_{2})}

where γ ₁ and γ ₂ are two coefficients of spectral expansion such that 0≤γ ₂ ≤γ ₁ ≤1. Noise masking has been improved by E. Ordentlich and Y. Shoham, in their article "Low-Delay Code-Excited Linear Predictive Coding of Wideband Speech at 32 kbps", Proc. ICASSP, Toronto, May 1991, pages 9-12. This improvement consists in combining for the perceptual weighting the filter W (z) with another filter modeling the slope of the spectrum. This improvement is particularly appreciable in the case of coding of signals with high spectral dynamics (wide band or audio band) for which the authors have shown a significant improvement in the subjective quality of the reconstructed signal.

Dans la plupart des décodeurs CELP actuels, les coefficients de prédiction linéaire a_i sont également utilisés pour définir un post-filtre servant à atténuer les zones fréquentielles entre les formants et les harmoniques du signal de parole, sans modifier la pente du spectre du signal. Une forme habituelle de la fonction de transfert de ce post-filtre est : $H_{PF} (z) = G_{P} \frac{A (z {/β}_{1})}{A (z {/β}_{2})} (1-µ r_{1} z^{-1})$

où G_P est un facteur de gain compensant l'atténuation des filtres, β₁ et β₂ sont des coefficients tels que 0≤β₁≤β₂≤1, µ est une constante positive et r₁ désigne le premier coefficient de réflexion dépendant des coefficients a_i.In most current CELP decoders, the linear prediction coefficients a _i are also used to define a post-filter used to attenuate the frequency zones between the formants and the harmonics of the speech signal, without modifying the slope of the spectrum of the signal. A usual form of the transfer function of this post-filter is:

H_{PF} ((z) = G_{P} \frac{AT (z {/β}_{1})}{AT (z {/ β}_{2})} (1-µ r_{1} z^{-1})

where G _P is a gain factor compensating for the attenuation of the filters, β ₁ and β ₂ are coefficients such that 0≤β ₁ ≤β ₂ ≤1, µ is a positive constant and r ₁ denotes the first dependent reflection coefficient coefficients a _i .

La modélisation de l'enveloppe spectrale du signal par les coefficients a_i constitue donc un élément essentiel du processus de codage et de décodage, en ce sens qu'elle doit représenter le contenu spectral du signal à reconstituer au décodeur et qu'elle pilote aussi bien le masquage du bruit de quantification que le post-filtrage au décodeur.Modeling the spectral envelope of the signal by the coefficients a _i therefore constitutes an essential element of the coding and decoding process, in the sense that it must represent the spectral content of the signal to be reconstructed at the decoder and that it also controls masking quantization noise as well as post-filtering at the decoder.

Pour des signaux à forte dynamique spectrale, l'analyse par prédiction linéaire habituellement pratiquée ne parvient pas à modéliser fidèlement l'enveloppe du spectre. Souvent, les signaux de parole sont sensiblement plus énergétiques aux basses fréquences qu'aux fréquences élevées, de sorte que l'analyse par prédiction linéaire conduit certes à une modélisation précise aux basses fréquences, mais au détriment de la modélisation du spectre aux fréquences plus élevées. Cet inconvénient devient particulièrement gênant dans le cas du codage en bande élargie.For signals with strong spectral dynamics, the analysis by linear prediction usually practiced does not succeed in faithfully modeling the envelope of the spectrum. Speech signals are often significantly more energetic at low frequencies than at high frequencies, so that linear prediction analysis certainly leads to precise modeling at low frequencies, but to the detriment of spectrum modeling at higher frequencies. . This drawback becomes particularly troublesome in the case of wideband coding.

Un but de la présente invention est d'améliorer la modélisation du spectre d'un signal audiofréquence dans un système faisant appel à un procédé d'analyse par prédiction linéaire. Un autre but est de rendre les performances d'un tel système plus homogènes pour des signaux d'entrée différents (parole, musique, sinusoïdes, signaux DTMF...), des largeurs de bande différentes (bande téléphonique, bande élargie, bande hifi...), des conditions différentes d'enregistrement (microphone directif, antenne acoustique...) et de filtrage.An object of the present invention is to improve the modeling of the spectrum of an audiofrequency signal in a system using a method of analysis by linear prediction. Another aim is to make the performance of such a system more homogeneous for different input signals (speech, music, sinusoids, DTMF signals ...), different bandwidths (telephone band, extended band, stereo band ...), different registration conditions (directive microphone, acoustic antenna ...) and filtering.

L'invention propose ainsi un procédé d'analyse par prédiction linéaire d'un signal audiofréquence, pour déterminer des paramètres spectraux dépendant d'un spectre à court terme du signal audiofréquence, comprenant q étages de prédiction successifs, q étant un entier supérieur à 1. A chaque étage de prédiction p (1≤p≤q), on détermine des paramètres représentant un nombre prédéfini Mp de coefficients a ₁ ^p,...a _Mp ^p de prédiction linéaire d'un signal d'entrée dudit étage, le signal audiofréquence analysé constituant le signal d'entrée du premier étage, et le signal d'entrée d'un étage p+1 étant constitué par le signal d'entrée de l'étage p filtré par un filtre de fonction de transfert

The invention thus proposes a method of analysis by linear prediction of an audiofrequency signal, to determine spectral parameters dependent on a short-term spectrum of the audiofrequency signal, comprising q successive prediction stages, q being an integer greater than 1 At each prediction stage p (1 p p q q), parameters representing a predefined number M p of coefficients a ₁ ^p , ... a _M ^{p p} of linear prediction of an input signal of said stage are determined, the analyzed audio signal constituting the input signal of the first stage, and the input signal of a stage p + 1 being constituted by the input signal of the stage p filtered by a transfer function filter

Le nombre Mp de coefficients de prédiction linéaire peut notamment augmenter d'un étage au suivant. Ainsi, le premier étage pourra rendre compte assez fidèlement de la pente générale du spectre ou du signal, tandis que les étages suivants affineront la représentation des formants du signal. On évite ainsi, dans le cas de signaux à forte dynamique, de trop privilégier les zones les plus énergétiques au risque d'une modélisation médiocre des autres zones fréquentielles pouvant être perceptuellement importantes.The number Mp of linear prediction coefficients can notably increase from one stage to the next. Thus, the first stage will be able to give a fairly faithful account of the general slope of the spectrum or of the signal, while the following stages will refine the representation of the signal formants. One thus avoids, in the case of signals with strong dynamics, to privilege too much the most energetic zones with the risk of a poor modeling of the other frequency zones being able to be perceptually important.

Un second aspect de l'invention concerne une application de ce procédé d'analyse par prédiction linéaire dans un codeur audiofréquence à analyse par synthèse à adaptation "forward". L'invention propose ainsi un procédé de codage d'un signal audiofréquence comprenant les étapes suivantes :

analyse par prédiction linéaire d'un signal audiofréquence numérisé en trames successives pour déterminer des paramètres définissant un filtre de synthèse à court terme ;
détermination de paramètres d'excitation définissant un signal d'excitation à appliquer au filtre de synthèse à court terme pour produire un signal synthétique représentatif du signal audiofréquence ; et
production de valeurs de quantification des paramètres définissant le filtre de synthèse à court terme et des paramètres d'excitation,

dans lequel l'analyse par prédiction linéaire est un processus à q étages successifs tel que défini ci-dessus, et dans lequel le filtre de prédiction à court terme a une fonction de transfert de la forme 1/A(z) avec

A second aspect of the invention relates to an application of this method of analysis by linear prediction in an audio-frequency coder with analysis by synthesis with "forward" adaptation. The invention thus proposes a method for coding an audio frequency signal comprising the following steps:

linear prediction analysis of an audio frequency signal digitized in successive frames to determine parameters defining a short-term synthesis filter;
determining excitation parameters defining an excitation signal to be applied to the short-term synthesis filter to produce a synthetic signal representative of the audio signal; and
production of quantification values of the parameters defining the short-term synthesis filter and of the excitation parameters,

in which the linear prediction analysis is a process in q successive stages as defined above, and in which the short-term prediction filter has a transfer function of the form 1 / A (z) with

La fonction de transfert A(z) ainsi obtenue peut également être utilisée pour définir selon la formule (2) la fonction de transfert du filtre de pondération perceptuelle lorsque le codeur est un codeur à analyse par synthèse avec détermination en boucle fermée du signal d'excitation. Une autre possibilité intéressante est d'adopter des coefficients d'expansion spectrale γ₁ et γ₂ pouvant varier d'un étage au suivant, c'est-à-dire de donner au filtre de pondération perceptuelle une fonction de transfert de la forme

où γ₁ ^p, γ₂ ^p désignent des paires de coefficients d'expansion spectrale tels que 0≤γ₂ ^p≤γ₁ ^p≤1 pour 1≤p≤q.The transfer function A (z) thus obtained can also be used to define according to formula (2) the transfer function of the perceptual weighting filter when the coder is a coder for analysis by synthesis with closed loop determination of the signal of excitation. Another interesting possibility is to adopt coefficients of spectral expansion γ ₁ and γ _{2 which} can vary from one stage to the next, that is to say to give the perceptual weighting filter a function of transfer of the form.

where γ ₁ ^p , γ ₂ ^p denote pairs of spectral expansion coefficients such that 0≤γ ₂ ^p ≤γ ₁ ^p ≤1 for 1≤p≤q.

L'invention est également applicable au niveau d'un décodeur associé. Le procédé de décodage ainsi mis en oeuvre selon l'invention comprend les étapes suivantes :

on reçoit des valeurs de quantification de paramètres définissant un filtre de synthèse à court terme et des paramètres d'excitation, les paramètres définissant le filtre de synthèse à court terme comprenant un nombre q>1 de jeux de coefficients de prédiction linéaire, chaque jeu comportant un nombre prédéfini de coefficients ;
on produit un signal d'excitation sur la base des valeurs de quantification des paramètres d'excitation ;
on produit un signal audiofréquence synthétique en filtrant le signal d'excitation par un filtre de synthèse ayant une fonction de transfert de la forme 1/A(z) avec
où les coefficients a₁ ^p,...,a_Mp ^p correspondent au p-ième jeu de coefficients de prédiction linéaire pour 1≤p≤q.

The invention is also applicable at the level of an associated decoder. The decoding method thus implemented according to the invention comprises the following steps:

parameters quantization values defining a short-term synthesis filter and excitation parameters are received, the parameters defining the short-term synthesis filter comprising a number q> 1 of sets of linear prediction coefficients, each set comprising a predefined number of coefficients;
an excitation signal is produced based on the quantization values of the excitation parameters;
a synthetic audio frequency signal is produced by filtering the excitation signal by a synthesis filter having a transfer function of the form 1 / A (z) with
where the coefficients a ₁ ^p , ..., a _Mp ^p correspond to the p-th set of linear prediction coefficients for 1≤p≤q.

Cette fonction de transfert A(z) peut également être utilisée pour définir un post-filtre dont la fonction de transfert comporte, comme dans la formule (3) ci-dessus, un terme de la forme A(z/β₁)/A(z/β₂), où β₁ et β₂ désignent des coefficients tels que 0≤β₁≤β₂≤1.This transfer function A (z) can also be used to define a post-filter whose transfer function comprises, as in formula (3) above, a term of the form A (z / β ₁ ) / A (z / β ₂ ), where β ₁ and β ₂ denote coefficients such as 0≤β ₁ ≤β ₂ ≤1.

Une variante intéressante consiste à remplacer ce terme de la fonction de transfert du post-filtre par :

où β₁ ^p, β₂ ^p désignent des paires de coefficients tels que 0≤β₁ ^p≤β₂ ^p≤1 pour 1≤p≤q.An interesting variant consists in replacing this term of the transfer function of the post-filter by:

where β ₁ ^p , β ₂ ^p denote pairs of coefficients such that 0≤β ₁ ^p ≤β ₂ ^p ≤1 for 1≤p≤q.

L'invention s'applique également à des codeurs audiofréquence à adaptation "backward". L'invention propose ainsi un procédé de codage d'un premier signal audiofréquence numérisé en trames successives, comprenant les étapes suivantes :

analyse par prédiction linéaire d'un second signal audiofréquence pour déterminer des paramètres définissant un filtre de synthèse à court terme ;
détermination de paramètres d'excitation définissant un signal d'excitation à appliquer au filtre de synthèse à court terme pour produire un signal synthétique représentatif du premier signal audiofréquence, ce signal synthétique constituant ledit second signal audiofréquence pour au moins une trame suivante ; et
production de valeurs de quantification des paramètres d'excitation,

The invention also applies to audio coders with "backward" adaptation. The invention thus proposes a method for coding a first digital audio signal digitized in successive frames, comprising the following steps:

linear prediction analysis of a second audio frequency signal to determine parameters defining a short-term synthesis filter;
determining excitation parameters defining an excitation signal to be applied to the short-term synthesis filter to produce a synthetic signal representative of the first audio frequency signal, this synthetic signal constituting said second audio frequency signal for at least one following frame; and
production of quantization values of the excitation parameters,

Pour une mise en oeuvre dans un décodeur associé, l'invention propose un procédé de décodage d'un flux binaire pour construire en trames successives un signal audiofréquence codé par ledit flux binaire, comprenant les étapes suivantes :

on reçoit des valeurs de quantification de paramètres d'excitation ;
on produit un signal d'excitation sur la base des valeurs de quantification des paramètres d'excitation ;
on produit un signal audiofréquence synthétique en filtrant le signal d'excitation par un filtre de synthèse à court terme ;
on effectue une analyse par prédiction linéaire du signal synthétique pour obtenir des coefficients du filtre de synthèse à court terme pour au moins une trame suivante,

For an implementation in an associated decoder, the invention proposes a method of decoding a bit stream for constructing in successive frames an audio frequency signal coded by said bit stream, comprising the following steps:

quantization values of excitation parameters are received;
an excitation signal is produced based on the quantization values of the excitation parameters;
a synthetic audio signal is produced by filtering the excitation signal with a short-term synthesis filter;
an analysis is carried out by linear prediction of the synthetic signal to obtain coefficients of the short-term synthesis filter for at least one following frame,

in which the linear prediction analysis is a process with q successive stages as defined above, and in which the short-term prediction filter has a transfer function of the form 1 / A (z) with

L'invention permet encore de réaliser des codeurs/décodeurs audiofréquence mixtes, c'est-à-dire faisant appel à la fois à des schémas d'adaptation "forward" et "backward", le ou les premiers étages de prédiction linéaire correspondant à une analyse "forward" et le ou les derniers étages à une analyse "backward". L'invention propose ainsi un procédé de codage d'un premier signal audiofréquence numérisé en trames successives, comprenant les étapes suivantes :

analyse par prédiction linéaire du premier signal audiofréquence pour déterminer des paramètres définissant une première composante d'un filtre de synthèse à court terme ;
détermination de paramètres d'excitation définissant un signal d'excitation à appliquer au filtre de synthèse à court terme pour produire un signal synthétique représentatif du premier signal audio-fréquence ;
production de valeurs de quantification des paramètres définissant la première composante du filtre de synthèse à court terme et des paramètres d'excitation ;
filtrage du signal synthétique par un filtre de fonction de transfert correspondant à l'inverse de la fonction de transfert de la première composante du filtre de synthèse à court terme ; et
analyse par prédiction linéaire du signal synthétique filtré pour obtenir des coefficients d'une seconde composante du filtre de synthèse à court terme pour au moins une trame suivante,

dans lequel l'analyse par prédiction linéaire du premier signal audiofréquence est un processus à q_F étages successifs, q_F étant un entier au moins égal à 1, ledit processus à q_F étages comportant, à chaque étage de prédiction p (1≤p≤q_F), la détermination de paramètres représentant un nombre prédéfini MFp de coefficients a ₁ ^F ^, ^p,...,a _MFp ^F,p de prédiction linéaire d'un signal d'entrée dudit étage, le premier signal audiofréquence constituant le signal d'entrée du premier étage, et le signal d'entrée d'un étage p+1 étant constitué par le signal d'entrée de l'étage p filtré par un filtre de fonction de transfert

la première composante du filtre de synthèse à court terme ayant une fonction de transfert de la forme 1/A^F(z) avec

et dans lequel l'analyse par prédiction linéaire du signal synthétique filtré est un processus à q_B étages successifs, q_B étant un entier au moins égal à 1, ledit processus à q_B étages comportant, à chaque étage de prédiction p (1≤p≤q_B), la détermination de paramètres représentant un nombre prédéfini MBp de coefficients a ₁ ^B,p ,...,a _MBp ^B,p de prédiction linéaire d'un signal d'entrée dudit étage, le signal synthétique filtré constituant le signal d'entrée du premier étage, et le signal d'entrée d'un étage p+1 étant constitué par le signal d'entrée de l'étage p filtré par un filtre de fonction de transfert

la seconde composante du filtre de synthèse à court terme ayant une fonction de transfert de la forme 1/A^B(z) avec

et le filtre de synthèse à court terme ayant une fonction de transfert de la forme 1/A(z) avec A(z)=A^F(z).A^B(z).The invention also makes it possible to produce mixed audio-frequency coders / decoders, that is to say using both "forward" and "backward" adaptation schemes, the first or first stages of linear prediction corresponding to a "forward" analysis and the last stage (s) to a "backward" analysis. The invention thus proposes a method for coding a first digital audio signal digitized in successive frames, comprising the following steps:

linear prediction analysis of the first audio signal to determine parameters defining a first component of a short-term synthesis filter;
determining excitation parameters defining an excitation signal to be applied to the short-term synthesis filter to produce a synthetic signal representative of the first audio-frequency signal;
production of quantization values of the parameters defining the first component of the short-term synthesis filter and of the excitation parameters;
filtering the synthetic signal by a transfer function filter corresponding to the inverse of the transfer function of the first component of the short-term synthesis filter; and
analysis by linear prediction of the filtered synthetic signal to obtain coefficients of a second component of the short-term synthesis filter for at least one following frame,

in which the linear prediction analysis of the first audio frequency signal is a process with q _F successive stages, q _F being an integer at least equal to 1, said process with q _F stages comprising, at each stage of prediction p (1≤p≤q _F ), the determination of parameters representing a predefined number MFp of coefficients a ₁ ^F ^, ^p , ..., a _MFp ^{F, p} of linear prediction of an input signal of said stage, the first audio frequency signal constituting the input signal of the first stage, and the input signal of a stage p + 1 being constituted by the input signal of the stage p filtered by a transfer function filter

the first component of the short-term synthesis filter having a transfer function of the form 1 / A ^F (z) with

and in which the linear prediction analysis of the filtered synthetic signal is a process with q _B successive stages, q _B being an integer at least equal to 1, said process with q _B stages comprising, at each prediction stage p (1≤ p≤q _B ), the determination of parameters representing a predefined number MBp of coefficients a ₁ ^{B, p} , ..., a _MBp ^{B, p} of linear prediction of an input signal of said stage, the filtered synthetic signal constituting the input signal of the first stage, and the input signal of a stage p + 1 being constituted by the input signal of the stage p filtered by a transfer function filter

the second component of the short-term synthesis filter having a transfer function of the form 1 / A ^B (z) with

and the short-term synthesis filter having a transfer function of the form 1 / A (z) with A (z) = A ^F (z) .A ^B (z).

Pour une mise en oeuvre dans un décodeur mixte associé, l'invention propose un procédé de décodage d'un flux binaire pour construire en trames successives un signal audiofréquence codé par ledit flux binaire, comprenant les étapes suivantes :

on reçoit des valeurs de quantification de paramètres définissant une première composante d'un filtre de synthèse à court terme et de paramètres d'excitation, les paramètres définissant la première composante du filtre de synthèse à court terme représentant un nombre q_F au moins égal à 1 de jeux de coefficients de prédiction linéaire a ₁ ^F,p,...,a _MFp ^F,p pour 1≤p≤q_F, chaque jeu p comportant un nombre prédéfini MFp de coefficients, la première composante du filtre de synthèse à court terme ayant une fonction de transfert de la forme 1/A^F(z) avec
on produit un signal d'excitation sur la base des valeurs de quantification des paramètres d'excitation ;
on produit un signal audiofréquence synthétique en filtrant le signal d'excitation par un filtre de synthèse à court terme de fonction de transfert 1/A(z) avec A(z)=A^F(z).A^B(z), 1/A^B(z) représentant la fonction de transfert d'une seconde composante du filtre de synthèse à court terme ;
on filtre le signal synthétique par un filtre de fonction de transfert A^F(z) ; et
on effectue une analyse par prédiction linéaire du signal synthétique filtré pour obtenir des coefficients de la seconde composante du filtre de synthèse à court terme pour au moins une trame suivante,

dans lequel l'analyse par prédiction linéaire du signal synthétique filtré est un processus à q_B étages tel que défini ci-dessus, et dans lequel le filtre de synthèse à court terme a une fonction de transfert de la forme 1/A(z)=1/[A^F(z).A^B(z)] avec

For an implementation in an associated mixed decoder, the invention proposes a method of decoding a bit stream for constructing in successive frames an audio frequency signal coded by said bit stream, comprising the following steps:

we receive quantization values of parameters defining a first component of a short-term synthesis filter and of excitation parameters, the parameters defining the first component of the short-term synthesis filter representing a number q _F at least equal to 1 of sets of linear prediction coefficients a ₁ ^{F, p} , ..., a _MFp ^{F, p} for 1≤p≤q _F , each set p comprising a predefined number MFp of coefficients, the first component of the synthesis filter to short term having a transfer function of the form 1 / A ^F (z) with
an excitation signal is produced based on the quantization values of the excitation parameters;
a synthetic audio signal is produced by filtering the excitation signal by a short-term synthesis filter with transfer function 1 / A (z) with A (z) = A ^F (z). A ^B (z), 1 / A ^B (z) representing the transfer function of a second component of the short-term synthesis filter;
the synthetic signal is filtered by a transfer function filter A ^F (z); and
an analysis is performed by linear prediction of the filtered synthetic signal to obtain coefficients of the second component of the short-term synthesis filter for at least one following frame,

in which the linear prediction analysis of the filtered synthetic signal is a q _B- stage process as defined above, and in which the short-term synthesis filter has a transfer function of the form 1 / A (z) = 1 / [A ^F (z) .A ^B (z)] with

Bien qu'on accorde une importance particulière aux applications de l'invention dans le domaine du codage/ décodage à analyse par synthèse, il convient d'observer que le procédé d'analyse par prédiction linéaire à étages multiples proposé selon l'invention comporte de nombreuses autres applications dans le traitement de signaux audio, par exemple dans les codeurs prédictifs par transformée, dans des systèmes de reconnaissance de parole, dans des systèmes d'accentuation de parole (speech enhancement) ...Although particular importance is given to the applications of the invention in the field of coding / decoding with analysis by synthesis, it should be observed that the method of analysis by linear prediction with multiple stages proposed according to the invention comprises many other applications in the processing of audio signals, for example in transform predictive coders, in speech recognition systems, in speech enhancement systems ...

D'autres particularités et avantages de la présente invention apparaîtront dans la description ci-après d'exemples de réalisation préférés mais non limitatifs, en référence aux dessins annexés, dans lesquels :

la figure 1 est un organigramme d'un procédé d'analyse par prédiction linéaire selon l'invention ;
la figure 2 est un diagramme spectral comparant les résultats d'un procédé selon l'invention avec ceux d'un procédé conventionnel d'analyse par prédiction linéaire ;
les figures 3 et 4 sont des schémas synoptiques d'un décodeur et d'un codeur CELP pouvant mettre en oeuvre l'invention ;
les figures 5 et 6 sont des schémas synoptiques de variantes de décodeur et de codeur CELP pouvant mettre en oeuvre l'invention ; et
les figures 7 et 8 sont des schémas synoptiques d'autres variantes de décodeur et de codeur CELP pouvant mettre en oeuvre d'invention.

Other features and advantages of the present invention will appear in the description below of preferred but nonlimiting exemplary embodiments, with reference to the appended drawings, in which:

FIG. 1 is a flow diagram of a method of analysis by linear prediction according to the invention;
FIG. 2 is a spectral diagram comparing the results of a method according to the invention with those of a conventional method of analysis by linear prediction;
Figures 3 and 4 are block diagrams of a CELP decoder and coder capable of implementing the invention;
FIGS. 5 and 6 are block diagrams of variant CELP decoders and coders capable of implementing the invention; and
Figures 7 and 8 are block diagrams of other variants of CELP decoder and coder that can implement the invention.

Le signal audiofréquence à analyser dans le procédé illustré par la figure 1 est noté s⁰(n). Il est supposé disponible sous forme d'échantillons numériques, l'entier n désignant les instants d'échantillonnage successifs. Le procédé d'analyse par prédiction linéaire comprend q étages successifs 5₁,...,5_p,...,5_q. A chaque étage de prédiction 5_p (1≤p≤q), on effectue une prédiction linéaire d'ordre Mp d'un signal d'entrée s^p-1(n). Le signal d'entrée du premier étage 5₁ est constitué par le signal audiofréquence à analyser s⁰(n), tandis que le signal d'entrée d'un étage 5_p+1 (1≤p<q) est constitué par le signal s^p(n), obtenu à une étape notée 6_p en appliquant au signal d'entrée s^p-1(n) du p-ième étage 5_p un filtrage au moyen d'un filtre de fonction de transfert

où les coefficients a_i ^p (1≤i≤Mp) sont les coefficients de prédiction linéaire obtenus à l'étage 5_p.The audiofrequency signal to be analyzed in the method illustrated in FIG. 1 is denoted s ⁰ (n). It is assumed to be available in the form of digital samples, the integer n denoting the successive sampling instants. The linear prediction analysis method comprises q successive stages 5 ₁ , ..., 5 _p , ..., 5 _q . At each prediction stage 5 _p (1 _{p p} q q), a linear order Mp prediction of an input signal s ^p-1 (n) is carried out. The input signal of the first stage 5 ₁ is constituted by the audio frequency signal to be analyzed s ⁰ (n), while the input signal of a stage 5 _{p + 1} (1≤p <q) is constituted by the signal s ^p (n), obtained in a step denoted 6 _p by applying to the input signal s ^p-1 (n) of the p-th stage 5 _p a filtering by means of a transfer function filter

where the coefficients a _i ^p (1≤i≤Mp) are the coefficients of linear prediction obtained on the floor 5 _p .

Les méthodes d'analyse par prédiction linéaire pouvant être mises en oeuvre dans les différents étages 5₁,...,5_q sont bien connues dans la technique.The methods of analysis by linear prediction which can be implemented in the different stages 5 ₁ , ..., 5 _q are well known in the art.

On pourra par exemple se reporter aux ouvrages "Digital Processing of Speech Signals" de L.R. Rabiner et R.W. Shafer, Prentice-Hall Int., 1978 et "Linear Prediction of Speech" de J.D. Markel et A.H. Gray, Springer Verlag Berlin Heidelberg, 1976. On peut notamment utiliser l'algorithme de Levinson-Durbin, qui comporte les étapes suivantes (pour chaque étage 5_p) :

évaluation de Mp autocorrélations R(i) (0≤i≤Mp) du signal d'entrée s^p-1(n) de l'étage sur une fenêtre d'analyse de Q échantillons :
avec s*(n)=s^p-1(n).f(n), f(n) désignant une fonction de fenêtrage de longueur Q, par exemple une fonction rectangulaire ou une fonction de Hamming ;
évaluation récursive des coefficients a_i ^p: $E(0) = R(0)$
Pour i allant de 1 à Mp, faire
Pour j allant de 1 à i-1, faire $a_{j} ^{p,i} = a_{j} ^{p,i-1} - r_{i} ^{p} . a_{i-j} ^{p,i-1}$

We can for example refer to the works "Digital Processing of Speech Signals" by LR Rabiner and RW Shafer, Prentice-Hall Int., 1978 and "Linear Prediction of Speech" by JD Markel and AH Gray, Springer Verlag Berlin Heidelberg, 1976. We can in particular use the Levinson-Durbin algorithm, which comprises the following steps (for each stage 5 _p ):

evaluation of Mp autocorrelations R (i) (0≤i≤Mp) of the input signal s ^p-1 (n) of the stage on an analysis window of Q samples:
with s * (n) = s ^p-1 (n) .f (n), f (n) designating a windowing function of length Q, for example a rectangular function or a Hamming function;
recursive evaluation of the coefficients a _i ^p : $E (0) = R (0)$
For i going from 1 to Mp, do
For j going from 1 to i-1, do ${at}_{j}^{p, i} = {at}_{j}^{p, i-1} - r_{i}^{p} . {at}_{ij}^{p, i-1}$

Les coefficients a_i ^p (i=1,...,Mp) sont pris égaux aux a_i ^p,Mp obtenus à la dernière itération. La quantité E(Mp) est l'énergie de l'erreur résiduelle de prédiction de l'étage p. Les coefficients r_i ^p, compris entre -1 et 1, sont appelés coefficients de réflexion. Ils peuvent être représentés par les rapports logarithmiques (log-area-ratios) LAR_i ^p=LAR(r_i ^p), la fonction LAR étant définie par LAR(r)= log₁₀[(1-r)/(1+r)].The coefficients a _i ^p (i = 1, ..., Mp) are taken equal to the a _i ^{p, Mp} obtained at the last iteration. The quantity E (Mp) is the energy of the residual prediction error of stage p. The coefficients r _i ^p , between -1 and 1, are called reflection coefficients. They can be represented by logarithmic ratios (log-area-ratios) LAR _i ^p = LAR (r _i ^p ), the LAR function being defined by LAR (r) = log ₁₀ [(1-r) / (1 + r )].

Dans un certain nombre d'applications, on a besoin d'opérer une quantification des coefficients de prédiction obtenus. La quantification peut être effectuée sur les coefficients a_i ^p directement, sur les coefficients de réflexion associés r_i ^p ou sur les rapports logarithmiques LAR_i ^p. Une autre possibilité est de quantifier des paramètres de raie spectrale (LSP pour "line spectrum pairs", ou LSF pour "line spectrum frequencies"). Les Mp fréquences de raie spectrale ω_i ^p(1≤i≤Mp), normalisées entre 0 et π, sont telles que les nombres complexes 1, exp(jω₂ ^p), exp(jω₄ ^p),...,exp(jω_Mp ^p), soient les racines du polynôme P^p(z)=A^p(z)-z^-(Mp+1)A^p(z^-1) et que les nombres complexes exp(jω₁ ^p), exp(jω₃ ^p),...,exp(jω^p _Mp-1), et -1 soient les racines du polynôme Q^p(z)=A^p(z)+z^-(Mp+1)A^p(z^-1). La quantification peut porter sur les fréquences normalisées ω_i ^p ou sur leurs cosinus.In a certain number of applications, there is a need to quantify the prediction coefficients obtained. Quantification can be performed on the coefficients a _i ^p directly, on the associated reflection coefficients r _i ^p or on the logarithmic ratios LAR _i ^p . Another possibility is to quantify spectral line parameters (LSP for "line spectrum pairs", or LSF for "line spectrum frequencies"). The Mp spectral line frequencies ω _i ^p (1≤i≤Mp), normalized between 0 and π, are such that the complex numbers 1, exp (jω ₂ ^p ), exp (jω ₄ ^p ), ..., exp (jω _Mp ^p ), let the roots of the polynomial P ^p (z) = A ^p (z) -z ^{- (Mp + 1)} A ^p (z ^-1 ) and let the complex numbers exp (jω ₁ ^p ), exp (jω ₃ ^p ), ..., exp (jω ^p _Mp-1 ), and -1 are the roots of the polynomial Q ^p (z) = A ^p (z) + z ^{- (Mp + 1)} A ^p (z ^-1 ). The quantification can relate to the normalized frequencies ω _i ^p or to their cosines.

L'analyse peut être effectuée à chaque étage de prédiction 5_p selon l'algorithme classique de Levinson-Durbin ci-dessus rappelé. D'autres algorithmes fournissant les mêmes résultats, développés plus récemment, peuvent être utilisés avantageusement, notamment l'algorithme de Levinson éclaté (voir "A new Efficient Algorithm to Compute the LSP Parameters for Speech Coding", par S. Saoudi, J.M. Boucher et A. Le Guyader, Signal Processing, Vol.28, 1992, pages 201-212), ou l'utilisation des polynômes de Chebyshev (voir "The Computation of Line Spectrum Frequencies Using Chebyshev Polynomials, par P. Kabal et R.P. Ramachandran, IEEE Trans. on Acoustics, Speech, and Signal Processing, Vol. ASSP-34, n°6, pages 1419-1426, décembre 1986).The analysis can be performed at each 5 _p prediction stage according to the classic Levinson-Durbin algorithm mentioned above. Other algorithms providing the same results, developed more recently, can be used advantageously, in particular the exploded Levinson algorithm (see "A new Efficient Algorithm to Compute the LSP Parameters for Speech Coding", by S. Saoudi, JM Boucher and A. Le Guyader, Signal Processing, Vol.28, 1992, pages 201-212), or the use of Chebyshev polynomials (see "The Computation of Line Spectrum Frequencies Using Chebyshev Polynomials, by P. Kabal and RP Ramachandran, IEEE Trans. On Acoustics, Speech, and Signal Processing, Vol. ASSP-34, n ° 6, pages 1419-1426, December 1986).

Lorsque l'analyse multi-étages représentée sur la figure 1 est réalisée pour définir un filtre de prédiction à court terme du signal audiofréquence s⁰(n), on donne à la fonction de transfert A(z) de ce filtre, la forme

When the multi-stage analysis represented in FIG. 1 is carried out to define a short-term prediction filter for the audio frequency signal s ⁰ (n), the transfer function A (z) of this filter is given the form

On note que cette fonction de transfert obéit à la forme générale classique donnée par la formule (1), avec M=M1+...+Mq. Toutefois, les coefficients a_i de la fonction A(z) obtenus avec le processus de prédiction multi-étages diffèrent en général de ceux que procure le processus classique de prédiction en un seul étage.We note that this transfer function obeys the general classical form given by formula (1), with M = M1 + ... + Mq. However, the coefficients a _i of the function A (z) obtained with the multi-stage prediction process generally differ from those obtained by the classical single-stage prediction process.

Les ordres Mp des prédictions linéaires effectuées augmentent de préférence d'un étage au suivant : M1<M2<...<Mq. Ainsi, l'allure de l'enveloppe spectrale du signal analysé est modélisée relativement grossièrement au premier étage 5₁(M1=2 par exemple), et cette modélisation s'affine d'étage en étage sans perdre l'information globale fournie par le premier étage. On évite ainsi que soient insuffisamment pris en compte des paramètres comme la pente générale du spectre qui sont perceptuellement importants, particulièrement dans le cas de signaux en bande élargie et/ou à forte dynamique spectrale.The orders Mp of the linear predictions carried out preferably increase from one stage to the following: M1 <M2 <... <Mq. Thus, the appearance of the spectral envelope of the analyzed signal is modeled relatively roughly on the first stage 5 ₁ (M1 = 2 for example), and this modeling is refined from stage to stage without losing the global information provided by the first floor. This avoids that insufficiently taken into account parameters such as the general slope of the spectrum which are perceptually important, particularly in the case of signals in an extended band and / or with high spectral dynamics.

Dans une réalisation typique, le nombre q d'étages de prédiction successifs est égal à 2. Si on a pour objectif un filtre de synthèse d'ordre M, on peut alors prendre M1=2 et M2=M-2, les coefficients a_i du filtre (équation (1)) étant donnés par : $· a_{1} {= a}_{1} ^{1} {+ a}_{1} ^{2}$

· a_{2} = a_{2} ​^{1} + a_{1} ​^{1} a_{1} ​^{2} + a_{2} ​^{2}

· a_{k} = a_{2} ​^{1} a_{k-2} ​^{2} + a_{1} ​^{1} a_{k-1} ​^{2} + a_{k} ​^{2} pour 2<k≤M-2

· a_{M-1} = a_{2} ​^{1} a_{M-3} ​^{2} + a_{1} ​^{1} a_{M-2} ​^{2}

· a_{M} = a_{2} ​^{1} a_{M-2} ​^{2}

In a typical embodiment, the number q of successive prediction stages is equal to 2. If we aim for a synthesis filter of order M, we can then take M1 = 2 and M2 = M-2, the coefficients a _i of the filter (equation (1)) being given by:

· {at}_{1} {= a}_{1}^{1} {+ a}_{1}^{2}

· {at}_{2} = {at}_{2}^{1} + {at}_{1}^{1} {at}_{1}^{2} + {at}_{2}^{2}

· {at}_{k} = {at}_{2}^{1} {at}_{k-2}^{2} + {at}_{1}^{1} {at}_{k-1}^{2} + {at}_{k}^{2} for 2 <k≤M-2

· {at}_{M-1} = {at}_{2}^{1} {at}_{M-3}^{2} + {at}_{1}^{1} {at}_{M-2}^{2}

· {at}_{M} = {at}_{2}^{1} {at}_{M-2}^{2}

Pour la représentation et éventuellement la quantification du spectre à court terme, il est possible d'adopter l'un des jeux de paramètres spectraux précédemment évoqués (a_i ^p, r_i ^p, LAR_i ^p, ω_i ^p ou cos ω_i ^p pour 1≤i≤Mp) pour chacun des étages (1≤p≤q), ou encore les mêmes paramètres spectraux mais pour le filtre composé calculé selon les relations (9) à (13) (a_i, r_i, LAR_i, ω_i ou cos ω_i pour 1≤i≤M). Le choix entre ces paramètres de représentation, ou d'autres encore, dépend des contraintes de chaque application particulière.For the representation and possibly the quantification of the short-term spectrum, it is possible to adopt one of the sets of spectral parameters previously mentioned (a _i ^p , r _i ^p , LAR _i ^p , ω _i ^p or cos ω _i ^p for 1≤i≤Mp) for each of the stages (1≤p≤q), or even the same spectral parameters but for the compound filter calculated according to relations (9) to (13) (a _i , r _i , LAR _i , ω _i or cos ω _i for 1≤i≤M). The choice between these representation parameters, or others, depends on the constraints of each particular application.

Le graphique de la figure 2 montre une comparaison des enveloppes spectrales d'une portion voisée de 30 ms d'un signal de parole, modélisées par un processus classique de prédiction linéaire à un étage avec M=15 (courbe II) et par un processus selon l'invention de prédiction linéaire en q=2 étages avec M1=2 et M2=13 (courbe III). La fréquence d'échantillonnage Fe du signal était de 16 kHz. Le spectre du signal (module de sa transformée de Fourier) est représenté par la courbe I. Ce spectre est représentatif des signaux audiofréquence qui ont, en moyenne, plus d'énergie aux basses fréquences qu'aux hautes fréquences. La dynamique spectrale est parfois supérieure à celle de la figure 2 (60 dB). Les courbes (II) et (III) correspondent aux enveloppes spectrales modélisées | 1/A(e^2jπf/Fe)| . On voit que le procédé d'analyse selon l'invention améliore sensiblement la modélisation du spectre, particulièrement aux hautes fréquences (f>4 kHz). La pente générale du spectre et ses formants en haute fréquence sont mieux respectés par le processus d'analyse en plusieurs étages.The graph in Figure 2 shows a comparison of the spectral envelopes of a 30 ms voiced portion of a speech signal, modeled by a classical one-stage linear prediction process with M = 15 (curve II) and by a process according to the invention of linear prediction in q = 2 stages with M1 = 2 and M2 = 13 (curve III). The signal sampling frequency Fe was 16 kHz. The signal spectrum (modulus of its Fourier transform) is represented by curve I. This spectrum is representative of audio frequency signals which have, on average, more energy at low frequencies than at high frequencies. The spectral dynamics are sometimes higher than that of Figure 2 (60 dB). Curves (II) and (III) correspond to the modeled spectral envelopes | 1 / A (e ^{2jπf / Fe} ) | . It can be seen that the analysis method according to the invention appreciably improves the modeling of the spectrum, particularly at high frequencies (f> 4 kHz). The general slope of the spectrum and its high frequency formants are better respected by the multistage analysis process.

L'invention est décrite ci-après dans son application à un codeur de parole de type CELP.The invention is described below in its application to a CELP type speech coder.

Le processus de synthèse de parole mis en oeuvre dans un codeur et un décodeur CELP est illustré sur la figure 3. Un générateur d'excitation 10 délivre un code d'excitation c_k appartenant à un répertoire prédéterminé en réponse à un index k. Un amplificateur 12 multiplie ce code d'excitation par un gain d'excitation β, et le signal résultant est soumis à un filtre 14 de synthèse à long terme. Le signal de sortie u du filtre 14 est à son tour soumis à un filtre 16 de synthèse à court terme, dont la sortie s constitue ce qu'on considère ici comme le signal de parole synthétique. Ce signal synthétique est appliqué à un post-filtre 17 destiné à améliorer la qualité subjective de la parole reconstruite. Les techniques de post-filtrage sont bien connues dans le domaine du codage de parole (voir J.H. Chen et A. Gersho : "Adaptive postfiltering for quality enhancement of coded speech", IEEE Trans. on Speech and Audio Processing, Vol. 3-1, pages 59-71, janvier 1995). Dans l'exemple représenté, les coefficients du post-filtre 17 sont obtenus à partir des paramètres LPC caractérisant le filtre de synthèse à court terme 16. On comprendra que, comme dans certains décodeurs CELP actuels, le post-filtre 17 pourrait également comporter une composante de post-filtrage à long terme.The speech synthesis process implemented in a CELP coder and decoder is illustrated in FIG. 3. An excitation generator 10 delivers an excitation code c _k belonging to a predetermined repertoire in response to an index k. An amplifier 12 multiplies this excitation code by an excitation gain β, and the resulting signal is subjected to a long-term synthesis filter 14. The output signal u of the filter 14 is in turn subjected to a short-term synthesis filter 16, the output of which constitutes what is considered here as the synthetic speech signal. This synthetic signal is applied to a post-filter 17 intended to improve the subjective quality of the reconstructed speech. Post-filtering techniques are well known in the field of speech coding (see JH Chen and A. Gersho: "Adaptive postfiltering for quality enhancement of coded speech", IEEE Trans. On Speech and Audio Processing, Vol. 3-1 , pages 59-71, January 1995). In the example shown, the coefficients of the post-filter 17 are obtained from the LPC parameters characterizing the short-term synthesis filter 16. It will be understood that, as in certain current CELP decoders, the post-filter 17 could also include a long-term post-filtering component.

Les signaux précités sont des signaux numériques représentés par exemple par des mots de 16 bits à une cadence d'échantillonnage Fe égale par exemple à 16 kHz pour un codeur en bande élargie (50-7000 Hz). Les filtres de synthèse 14, 16 sont en général des filtres purement récursifs. Le filtre 14 de synthèse à long terme a typiquement une fonction de transfert de la forme 1/B(z) avec B(z)=1-Gz^-T. Le retard T et le gain G constituent des paramètres de prédiction à long terme (LTP) qui sont déterminés d'une manière adaptative par le codeur. Les paramètres LPC définissant le filtre 16 de synthèse à court terme sont déterminés au codeur par un procédé d'analyse par prédiction linéaire du signal de parole. Dans les codeurs et décodeurs CELP habituels, la fonction de transfert du filtre 16 est généralement de la forme 1/A(z) avec A(z) de la forme (1). La présente invention propose d'adopter une forme semblable de la fonction de transfert, dans laquelle A(z) est décomposée selon (7) comme indiqué précédemment. A titre d'exemple, les paramètres des différents étages peuvent être q=2, M1=2, M2=13 (M=M1+M2=15).The aforementioned signals are digital signals represented for example by words of 16 bits at a sampling rate Fe equal for example to 16 kHz for an encoder in wide band (50-7000 Hz). The synthesis filters 14, 16 are generally purely recursive filters. The long-term synthesis filter 14 typically has a transfer function of the form 1 / B (z) with B (z) = 1-Gz ^-T . The delay T and the gain G constitute long-term prediction parameters (LTP) which are determined adaptively by the coder. The LPC parameters defining the short-term synthesis filter 16 are determined at the coder by a method of analysis by linear prediction of the speech signal. In the usual CELP coders and decoders, the transfer function of the filter 16 is generally of the form 1 / A (z) with A (z) of the form (1). The present invention proposes to adopt a similar form of the transfer function, in which A (z) is broken down according to (7) as indicated above. For example, the parameters of the different stages can be q = 2, M1 = 2, M2 = 13 (M = M1 + M2 = 15).

On désigne ici par "signal d'excitation" le signal u(n) appliqué au filtre de synthèse à court terme 14. Ce signal d'excitation comporte une composante LTP G.u(n-T) et une composante résiduelle, ou séquence d'innovation, βc_k(n). Dans un codeur à analyse par synthèse, les paramètres caractérisant la composante résiduelle et, optionnellement, la composante LTP sont évalués en boucle fermée, en utilisant un filtre de pondération perceptuelle.The signal "excitation signal" is used here to denote the signal u (n) applied to the short-term synthesis filter 14. This excitation signal comprises an LTP Gu (nT) component and a residual component, or innovation sequence, βc _k (n). In a synthesis analysis coder, the parameters characterizing the residual component and, optionally, the LTP component are evaluated in a closed loop, using a perceptual weighting filter.

La figure 4 montre le schéma d'un codeur CELP. Le signal de parole s(n) est un signal numérique, par exemple fourni par un convertisseur analogique-numérique 20 traitant le signal de sortie amplifié et filtré d'un microphone 22. Le signal s(n) est numérisé en trames successives de Λ échantillons elles-mêmes divisées en sous-trames, ou trames d'excitation, de L échantillons (par exemple Λ=160, L=32).Figure 4 shows the diagram of a CELP coder. The speech signal s (n) is a digital signal, for example supplied by an analog-digital converter 20 processing the amplified and filtered output signal from a microphone 22. The signal s (n) is digitized in successive frames of Λ samples themselves divided into sub-frames, or excitation frames, of L samples (for example Λ = 160, L = 32).

Les paramètres LPC, LTP et EXC (index k et gain d'excitation β) sont obtenus au niveau du codeur par trois modules d'analyse respectifs 24, 26, 28. Ces paramètres sont ensuite quantifiés de façon connue en vue d'une transmission numérique efficace, puis soumis à un multiplexeur 30 qui forme le signal de sortie du codeur. Ces paramètres sont également fournis à un module 32 de calcul d'états initiaux de certains filtres du codeur. Ce module 32 comprend essentiellement une chaîne de décodage telle que celle représentée sur la figure 3. Comme le décodeur, le module 32 opère sur la base des paramètres LPC, LTP et EXC quantifiés. Si une interpolation des paramètres LPC est effectuée au décodeur, comme il est courant, la même interpolation est effectuée par le module 32. Le module 32 permet de connaître au niveau du codeur les états antérieurs des filtres de synthèse 14, 16 du décodeur, déterminés en fonction des paramètres de synthèse et d'excitation antérieurs à la sous-trame considérée.The LPC, LTP and EXC parameters (index k and excitation gain β) are obtained at the coder by three respective analysis modules 24, 26, 28. These parameters are then quantified in a known manner for transmission effective digital, then subjected to a multiplexer 30 which forms the output signal of the encoder. These parameters are also supplied to a module 32 for calculating the initial states of certain coder filters. This module 32 essentially comprises a decoding chain such as that shown in FIG. 3. Like the decoder, the module 32 operates on the basis of the quantized LPC, LTP and EXC parameters. If an interpolation of the LPC parameters is carried out at the decoder, as is common, the same interpolation is carried out by the module 32. The module 32 makes it possible to know at the level of the coder the previous states of the synthesis filters 14, 16 of the decoder, determined according to the synthesis and excitation parameters prior to the subframe considered.

Dans une première étape du processus de codage, le module 24 d'analyse à court terme détermine les paramètres LPC définissant le filtre de synthèse à court terme, en analysant les corrélations à court terme du signal de parole s(n). Cette détermination est effectuée par exemple une fois par trame de Λ échantillons, de manière à s'adapter à l'évolution du contenu spectral du signal de parole. Elle consiste selon l'invention à mettre en oeuvre le procédé d'analyse illustré par la figure 1 avec s⁰(n)=s(n).In a first step of the coding process, the short-term analysis module 24 determines the LPC parameters defining the short-term synthesis filter, by analyzing the short-term correlations of the speech signal s (n). This determination is made for example once per frame of Λ samples, so as to adapt to the evolution of the spectral content of the speech signal. According to the invention, it consists in implementing the analysis method illustrated in FIG. 1 with s ⁰ (n) = s (n).

L'étape suivante du codage consiste en la détermination des paramètres LTP de prédiction à long terme. Ceux-ci sont par exemple déterminés une fois par sous-trame de L échantillons. Un soustracteur 34 soustrait du signal de parole s(n) la réponse à un signal d'entrée nul du filtre de synthèse à court terme 16. Cette réponse est déterminée par un filtre 36 de fonction de transfert 1/A(z) dont les coefficients sont donnés par les paramètres LPC qui ont été déterminés par le module 24, et dont les états initiaux ŝ sont fournis par le module 32 de façon à correspondre aux M=M1+ ...+Mq derniers échantillons du signal synthétique. Le signal de sortie du soustracteur 34 est soumis à un filtre 38 de pondération perceptuelle dont le rôle est d'accentuer les portions du spectre où les erreurs sont les plus perceptibles, c'est-à-dire les zones inter-formantiques.The next step in coding is to determine the LTP parameters for long-term prediction. These are for example determined once per subframe of L samples. A subtractor 34 subtracts from the speech signal s (n) the response to a zero input signal from the short-term synthesis filter 16. This response is determined by a filter 36 of transfer function 1 / A (z) whose coefficients are given by the LPC parameters which have been determined by the module 24, and whose initial states ŝ are supplied by the module 32 so as to correspond to the M = M1 + ... + Mq last samples of the synthetic signal. The output signal from the subtractor 34 is subjected to a perceptual weighting filter 38 whose role is to accentuate the portions of the spectrum where the errors are most perceptible, that is to say the inter-forming zones.

La fonction de transfert W(z) du filtre de pondération perceptuelle 38 est de la forme W(z)=AN(z)/AP(z) où AN(z) et AP(z) sont des fonctions de transfert d'ordre M de type RIF (réponse impulsionnelle finie). Les coefficients respectifs b_i et c_i (1≤i≤M) des fonctions AN(z) et AP(z) sont calculés pour chaque trame par un module 39 d'évaluation de la pondération perceptuelle qui les fournit au filtre 38. Une première possibilité est de prendre AN(z)=A(z/γ₁) et AP(z)=A(z/γ₂) avec 0≤γ₂≤γ₁≤1, ce qui revient à la forme habituelle (2) avec A(z) de la forme (7). Dans le cas d'un signal en bande élargie avec q=2, M1=2 et M2=13, on a trouvé que le choix γ₁=0,92 et γ₂=0,6 fournissait de bons résultats.The transfer function W (z) of the perceptual weighting filter 38 is of the form W (z) = AN (z) / AP (z) where AN (z) and AP (z) are order transfer functions RIF type M (finite impulse response). The respective coefficients b _i and c _i (1 _{i i} M M) of the functions AN (z) and AP (z) are calculated for each frame by a module 39 for evaluating the perceptual weighting which supplies them to the filter 38. A first possibility is to take AN (z) = A (z / γ ₁ ) and AP (z) = A (z / γ ₂ ) with 0≤γ ₂ ≤γ ₁ ≤1, which comes back to the usual form (2 ) with A (z) of the form (7). In the case of a wide band signal with q = 2, M1 = 2 and M2 = 13, we have found that the choice γ ₁ = 0.92 and γ ₂ = 0.6 gave good results.

L'invention permet toutefois, avec une surcharge de calculs très faible, d'avoir une plus grande souplesse quant à la mise en forme du bruit de quantification, en adoptant la forme (6) pour W(z), soit :

The invention however allows, with a very low computational overload, to have greater flexibility as regards the shaping of the quantization noise, by adopting the form (6) for W (z), that is:

Dans le cas d'un signal en bande élargie avec q=2, M1=2 et M2=13, on a trouvé que le choix γ₁ ¹=0,9, γ₂ ¹=0,65, y₁ ²=0,95 et γ₂ ²=0,75 fournissait de bons résultats. Le terme A¹(z/γ₁ ¹)/A¹(z/γ₂ ¹) permet de régler la pente générale du filtre 38, tandis que le terme A²(z/γ₁ ²)/A²(z/γ₂ ²) permet de régler le masquage au niveau des formants.In the case of a wideband signal with q = 2, M1 = 2 and M2 = 13, we have found that the choice γ ₁ ¹ = 0.9, γ ₂ ¹ = 0.65, y ₁ ² = 0 , 95 and γ ₂ ² = 0.75 provided good results. The term A ¹ (z / γ ₁ ¹ ) / A ¹ (z / γ ₂ ¹ ) makes it possible to adjust the general slope of the filter 38, while the term A ² (z / γ ₁ ² ) / A ² (z / γ ₂ ² ) adjusts the masking at the level of the formants.

L'analyse LTP en boucle fermée effectuée par le module 26 consiste, de façon classique, à sélectionner pour chaque sous-trame le retard T qui maximise la corrélation normalisée:

où x'(n) désigne le signal de sortie du filtre 38 pendant la sous-trame considérée, et y_T(n) désigne le produit de convolution u(n-T)*h'(n). Dans l'expression ci-dessus, h'(0), h' (1)...,h'(L-1) désigne la réponse impulsionnelle du filtre de synthèse pondéré, de fonction de transfert W(z)/A(z). Cette réponse impulsionnelle h' est obtenue par un module 40 de calcul de réponses impulsionnelles, en fonction des coefficients b_i et c_i fournis par le module 39 et des paramètres LPC qui ont été déterminés pour la sous-trame, le cas échéant après quantification et interpolation. Les échantillons u(n-T) sont les états antérieurs du filtre 14 de synthèse à long terme, fournis par le module 32. Pour les retards T inférieurs à la longueur d'une sous-trame, les échantillons manquants u(n-T) sont obtenus par interpolation sur la base des échantillons antérieurs, ou à partir du signal de parole. Les retards T, entiers ou fractionnaires, sont sélectionnés dans une fenêtre déterminée. Pour réduire la plage de recherche en boucle fermée, et donc pour réduire le nombre de convolutions y_T(n) à calculer, on peut d'abord déterminer un retard T' en boucle ouverte par exemple une fois par trame, puis sélectionner les retards en boucle fermée pour chaque sous-trame dans un intervalle réduit autour de T'. La recherche en boucle ouverte consiste plus simplement à déterminer le retard T' qui maximise l'autocorrélation du signal de parole s(n) éventuellement filtré par le filtre inverse de fonction de transfert A(z). Une fois que le retard T a été déterminé, le gain G de prédiction à long terme est obtenu par :

The closed loop LTP analysis performed by the module 26 consists, in a conventional manner, in selecting for each subframe the delay T which maximizes the normalized correlation:

where x '(n) denotes the output signal of the filter 38 during the sub-frame considered, and y _T (n) denotes the convolution product u (nT) * h' (n). In the above expression, h '(0), h' (1) ..., h '(L-1) denotes the impulse response of the weighted synthesis filter, with transfer function W (z) / A (z). This impulse response h ′ is obtained by a module 40 for calculating impulse responses, as a function of the coefficients b _i and c _i provided by the module 39 and of the LPC parameters which have been determined for the sub-frame, if appropriate after quantification. and interpolation. The samples u (nT) are the previous states of the synthesis filter 14 at long term, supplied by module 32. For delays T less than the length of a sub-frame, the missing samples u (nT) are obtained by interpolation on the basis of the previous samples, or from the speech signal. The delays T, whole or fractional, are selected in a specific window. To reduce the search range in closed loop, and therefore to reduce the number of convolutions y _T (n) to calculate, we can first determine a delay T 'in open loop for example once per frame, then select the delays in closed loop for each subframe in a reduced interval around T '. The open loop search consists more simply in determining the delay T 'which maximizes the autocorrelation of the speech signal s (n) possibly filtered by the reverse filter with transfer function A (z). Once the delay T has been determined, the gain G of long-term prediction is obtained by:

Pour rechercher l'excitation CELP relative à une sous-trame, le signal Gy_T(n), qui a été calculé par le module 26 pour le retard optimal T, est d'abord soustrait du signal x'(n) par le soustracteur 42. Le signal résultant x(n) est soumis à un filtre à rebours 44 qui fournit un signal D(n) donné par :

où h(0), h(1),..., h(L-1) désigne la réponse impulsionnelle du filtre composé des filtres de synthèse et du filtre de pondération perceptuelle, calculée par le module 40. En d'autres termes, le filtre composé a pour fonction de transfert W(z)/[A(z).B(z)]. En notation matricielle, on a donc :

D = (D(0), D(1),..., D(L-1)) = x.H

avec

x = (x(0), x(1),..., x(L-1))

et

To find the CELP excitation relating to a subframe, the signal Gy _T (n), which has been calculated by the module 26 for the optimal delay T, is first subtracted from the signal x '(n) by the subtractor 42. The resulting signal x (n) is subjected to a reverse filter 44 which provides a signal D (n) given by:

where h (0), h (1), ..., h (L-1) denotes the impulse response of the filter composed of the synthesis filters and of the perceptual weighting filter, calculated by the module 40. In other words, the compound filter has the transfer function W (z) / [A (z) .B (z)] . In matrix notation, we therefore have:

D = (D (0), D (1), ..., D (L-1)) = xH

with

x = (x (0), x (1), ..., x (L-1))

and

Le vecteur D constitue un vecteur-cible pour le module 28 de recherche de l'excitation. Ce module 28 détermine un mot de code du répertoire qui maximise la corrélation normalisée P _k ² /α _k ² dans laquelle : $P_{k} = {D.c}_{k}^{T}$

α_{k}^{2} {= c}_{k} {.H}^{T} {.H.c}_{k}^{T} = c_{k} . {U.c}_{k}^{T}

The vector D constitutes a target vector for the module 28 for searching for the excitation. This module 28 determines a code word from the directory which maximizes the normalized correlation P _k ² / α _k ² in which:

P_{k} = {Dc}_{k}^{T}

α_{k}

^{2} {= c}_{k} {.H}^{T} {.Hc}_{k}^{T} = {vs}_{k} . {Uc}_{k}^{T}

L'indice k optimal ayant été déterminé, le gain d'excitation β est pris égal à β = P_k/α_k ².The optimal index k having been determined, the excitation gain β is taken equal to β = P _k / α _k ² .

En référence à la figure 3, le décodeur CELP comprend un démultiplexeur 8 recevant le flux binaire issu du codeur. Les valeurs quantifiées des paramètres d'excitation EXC et des paramètres de synthèse LTP et LPC sont fournies au générateur 10, à l'amplificateur 12 et aux filtres 14, 16 pour reconstituer le signal synthétique ŝ, qui est soumis au post-filtre 17 puis converti en analogique par le convertisseur 18 avant d'être amplifié puis appliqué à un haut-parleur 19 pour restituer la parole originale.With reference to FIG. 3, the CELP decoder comprises a demultiplexer 8 receiving the bit stream from the coder. The quantized values of the excitation parameters EXC and of the synthesis parameters LTP and LPC are supplied to the generator 10, to the amplifier 12 and to the filters 14, 16 to reconstruct the synthetic signal ŝ, which is subjected to the post-filter 17 then converted to analog by the converter 18 before being amplified and then applied to a loudspeaker 19 to restore the original speech.

Dans le cas du décodeur de la figure 3, les paramètres LPC sont par exemple constitués par des index de quantification des coefficients de réflexion r _i ^p (également appelés coefficients de corrélation partielle ou PARCOR) relatifs aux différents étages de prédiction linéaire. Un module 15 récupère les valeurs quantifiées des r _i ^p à partir des index de quantification, et les convertit pour fournir les q jeux de coefficients de prédiction linéaire. Cette conversion est par exemple effectuée par la même méthode récursive que dans l'algorithme de Levinson-Durbin.In the case of the decoder of FIG. 3, the LPC parameters are for example constituted by quantization indexes of the reflection coefficients r _i ^p (also called partial correlation coefficients or PARCOR) relating to the different stages of linear prediction. A module 15 recovers the quantized values of the r _i ^p from the quantization indexes, and converts them to provide the q sets of linear prediction coefficients. This conversion is for example carried out by the same recursive method as in the Levinson-Durbin algorithm.

Les jeux de coefficients a _i ^p sont fournis au filtre 16 de synthèse à court terme constitué par une succession de q filtres/étages de fonctions de transfert 1/A¹(z),..., 1/A^q(z) données par la relation (4). Le filtre 16 pourrait également être en un seul étage de fonction de transfert 1/A(z) donnée par la relation (1) dans laquelle les coefficients a_i ont été calculés selon les relations (9) à (13).The sets of coefficients a _i ^p are supplied to the short-term synthesis filter 16 constituted by a succession of q filters / stages of transfer functions 1 / A ¹ (z), ..., 1 / A ^q (z) given by the relation (4) . The filter 16 could also be in a single stage of transfer function 1 / A (z) given by the relation (1) in which the coefficients a _i have been calculated according to the relations (9) to (13).

Les jeux de coefficients a _i ^p sont également fournis au post-filtre 17 qui, dans l'exemple considéré, a une fonction de transfert de la forme $H_{PF} (z) = G_{P} \frac{APN (z)}{APP (z)} (1-µ r_{1} z^{-1})$

où APN(z) et APP(z) sont des fonctions de transfert d'ordre M de type RIF, G_P est un facteur de gain constant, µ est une constante positive et r₁ désigne le premier coefficient de réflexion. Le coefficient de réflexion r₁ peut être celui associé aux coefficients a_i du filtre de synthèse composé, qu'il est alors nécessaire de calculer. On peut également prendre pour r₁ le premier coefficient de réflexion du premier étage de prédiction (r ₁=r ₁ ¹) moyennant un éventuel ajustement de la constante µ. Pour le terme APN(z)/APP(z), une première possibilité est de prendre APN(z)=A(z/β₁) et APP(z)=A(z/β₂) avec 0≤β₁≤β₂≤1, ce qui revient à la forme habituelle (3) avec A(z) de la forme (7).The sets of coefficients a _i ^p are also supplied to the post-filter 17 which, in the example considered, has a function of transfer of the form

H_{PF} ((z) = G_{P} \frac{APN (z)}{APP (z)} (1-µ r_{1} z^{-1})

where APN (z) and APP (z) are transfer functions of order M of the RIF type, G _P is a constant gain factor, µ is a positive constant and r ₁ denotes the first reflection coefficient. The reflection coefficient r ₁ can be that associated with the coefficients a _i of the composite synthesis filter, which it is then necessary to calculate. We can also take for r ₁ the first reflection coefficient of the first prediction stage ( r ₁ = r ₁ ¹ ) by means of a possible adjustment of the constant µ. For the term APN (z) / APP (z), a first possibility is to take APN (z) = A (z / β ₁ ) and APP (z) = A (z / β ₂ ) with 0≤β ₁ ≤ β ₂ ≤1, which comes back to the usual form (3) with A (z) of the form (7).

Comme dans le cas du filtre de pondération perceptuelle du codeur, l'invention permet d'adopter des coefficients β₁ et β₂ différents d'un étage au suivant (formule (8)), soit :

As in the case of the perceptual weighting filter of the coder, the invention makes it possible to adopt coefficients β ₁ and β ₂ different from one stage to the next (formula (8)), that is:

Dans le cas d'un signal en bande élargie avec q=2, M1=2 et M2=13, on a trouvé que le choix β₁ ¹=0,7, β ₂ ¹=0,9, β₁ ²=0,95 et β ₂ ²=0,97 fournissait de bons résultats.In the case of a wideband signal with q = 2, M1 = 2 and M2 = 13, we have found that the choice β ₁ ¹ = 0.7, β ₂ ¹ = 0.9, β ₁ ² = 0.95 and β ₂ ² = 0.97 provided good results.

L'invention a été décrite ci-dessus dans son application à un codeur prédictif à adaptation "forward", c'est-à-dire dans lequel le signal audiofréquence faisant l'objet de l'analyse par prédiction linéaire est le signal d'entrée du codeur. L'invention s'applique également à des codeurs/décodeurs prédictifs à adaptation "backward", dans lesquels le signal synthétique fait l'objet de l'analyse par prédiction linéaire au codeur et au décodeur (voir J.H. Chen et al: "A Low-Delay CELP Coder for the CCITT 16 kbit/s Speech Coding Standard", IEEE J.SAC, Vol.10, n°5, pages 830-848, juin 1992). Les figures 5 et 6 montrent respectivement un décodeur CELP et un codeur CELP à adaptation "backward" mettant en oeuvre la présente invention. Des références numériques identiques à celles des figures 3 et 4 ont été utilisées pour désigner des éléments analogues.The invention has been described above in its application to a predictive coder with forward adaptation, that is to say in which the audio frequency signal subject to analysis by linear prediction is the signal of encoder input. The invention also applies to predictive coders / decoders with backward adaptation, in which the synthetic signal is the subject of analysis by linear prediction at the coder and at the decoder (see JH Chen et al: "A Low -Delay CELP Coder for the CCITT 16 kbit / s Speech Coding Standard ", IEEE J. SAC, Vol.10, n ° 5, pages 830-848, June 1992). FIGS. 5 and 6 respectively show a CELP decoder and a CELP coder with "backward" adaptation implementing the present invention. Numerical references identical to those of FIGS. 3 and 4 have been used to designate similar elements.

Le décodeur à adaptation "backward" reçoit seulement les valeurs de quantification des paramètres définissant le signal d'excitation u(n) à appliquer au filtre de synthèse à court terme 16. Dans l'exemple considéré, ces paramètres sont l'index k et le gain associé β ainsi que les paramètres LTP. Le signal synthétique ŝ(n) est traité par un module 124 d'analyse par prédiction linéaire multi-étages identique au module 24 de la figure 3. Le module 124 fournit les paramètres LPC au filtre 16 pour une ou plusieurs trames suivantes du signal d'excitation, et au post-filtre 17 dont les coefficients sont obtenus comme décrit précédemment.The “backward” adaptation decoder receives only the quantization values of the parameters defining the excitation signal u (n) to be applied to the short-term synthesis filter 16. In the example considered, these parameters are the index k and the associated gain β as well as the LTP parameters. The synthetic signal ŝ (n) is processed by a module 124 multi-stage linear prediction analysis identical to module 24 of FIG. 3. Module 124 supplies the LPC parameters to filter 16 for one or more subsequent frames of the excitation signal, and to post-filter 17 whose coefficients are obtained as described above.

Le codeur correspondant, représenté sur la figure 6, effectue l'analyse par prédiction linéaire multi-étages sur le signal synthétique généré localement et non sur le signal audio s(n). Il comprend ainsi un décodeur local 132 consistant essentiellement en les éléments notés 10, 12, 14, 16 et 124 du décodeur de la figure 5. Outre les échantillons u du dictionnaire adaptatif et les états initiaux ŝ du filtre 36, le décodeur local 132 fournit les paramètres LPC obtenus par analyse du signal synthétique, qui sont utilisés par le module 39 d'évaluation de la pondération perceptuelle et le module 40 de calcul des réponses impulsionnelles h et h'. Pour le reste, le fonctionnement du codeur est identique à celui du codeur décrit en référence à la figure 4, sauf que le module d'analyse LPC 24 n'est plus nécessaire. Seuls les paramètres EXC et LTP sont envoyés vers le décodeur.The corresponding coder, represented in FIG. 6, performs the analysis by multistage linear prediction on the locally generated synthetic signal and not on the audio signal s (n). It thus comprises a local decoder 132 essentially consisting of the elements denoted 10, 12, 14, 16 and 124 of the decoder of FIG. 5. In addition to the samples u of the adaptive dictionary and the initial states ŝ of the filter 36, the local decoder 132 provides the LPC parameters obtained by analysis of the synthetic signal, which are used by the module 39 for evaluating the perceptual weighting and the module 40 for calculating the impulse responses h and h '. For the rest, the operation of the encoder is identical to that of the encoder described with reference to FIG. 4, except that the LPC analysis module 24 is no longer necessary. Only the EXC and LTP parameters are sent to the decoder.

Les figures 7 et 8 sont des schémas synoptiques d'un décodeur CELP et d'un codeur CELP à adaptation mixte. Les coefficients de prédiction linéaire du ou des premiers étages résultent d'une analyse "forward" du signal audiofréquence effectuée par le codeur, tandis que les coefficients de prédiction linéaire du ou des derniers étages résultent d'une analyse "backward" du signal synthétique effectuée par le décodeur (et par un décodeur local prévu dans le codeur). Des références numériques identiques à celles des figures 3 à 6 ont été utilisée pour désigner des éléments analogues.Figures 7 and 8 are block diagrams of a CELP decoder and a CELP coder with mixed adaptation. The linear prediction coefficients of the first stage or stages result from a "forward" analysis of the audio frequency signal carried out by the coder, while the linear prediction coefficients of the first stage or stages result from a "backward" analysis of the synthetic signal carried out by the decoder (and by a local decoder provided in the coder). Numerical references identical to those of FIGS. 3 to 6 have been used to designate similar elements.

Le décodeur mixte illustré sur la figure 7 reçoit les valeurs de quantification des paramètres EXC, LTP définissant le signal d'excitation u(n) à appliquer au filtre de synthèse à court terme 16, et les valeurs de quantification des paramètres LPC/F déterminés par l'analyse "forward" effectuée par le codeur. Ces paramètres LPC/F représentent q_F jeux de coefficients de prédiction linéaire a₁ ^F,p,..., a_MFp ^F,p pour 1≤p≤q_F, et définissent une première composante 1/A^F(z) de la fonction de transfert 1/A(z) du filtre 16 :

The mixed decoder illustrated in FIG. 7 receives the quantization values of the parameters EXC, LTP defining the excitation signal u (n) to be applied to the short-term synthesis filter 16, and the quantization values of the determined LPC / F parameters by the "forward" analysis performed by the coder. These LPC / F parameters represent q _F sets of linear prediction coefficients a ₁ ^{F, p} , ..., a _MFp ^{F, p} for 1≤p≤q _F , and define a first component 1 / A ^F (z) of the transfer function 1 / A (z) of filter 16:

Pour l'obtention de ces paramètres LPC/F, le codeur mixte représenté sur la figure 8 comporte un module 224/F qui analyse le signal audiofréquence à coder s(n) de la manière décrite en référence à la figure 1 si q_F>1, ou en un seul étage si q_F=1.To obtain these LPC / F parameters, the mixed coder represented in FIG. 8 comprises a module 224 / F which analyzes the audio frequency signal to be coded s (n) in the manner described with reference to FIG. 1 if q _F > 1, or in a single stage if q _F = 1.

L'autre composante 1/A^B(z) du filtre de synthèse à court terme 16 de fonction de transfert 1/A(z)=1/[A^F(z).A^B(z)] est donnée par

The other component 1 / A ^B (z) of the short-term synthesis filter 16 of transfer function 1 / A (z) = 1 / [A ^F (z) .A ^B (z)] is given by

Pour déterminer les coefficients a_i ^B,P, le décodeur mixte comporte un filtre inverse 200 de fonction de transfert A^F(z) qui filtre le signal synthétique

(n) produit par le filtre de synthèse à court terme 16 pour produire un signal synthétique filtré

⁰(n). Un module 224/B effectue l'analyse par prédiction linéaire de ce signal

⁰(n) de la manière décrite en référence à la figure 1 si q_B>1, ou en un seul étage si q_B=1. Les coefficients LPC/B ainsi obtenus sont fournis au filtre de synthèse 16 pour définir sa seconde composante pour la trame suivante. Ils sont également fournis, de même que les coefficients LPC/F au post-filtre 17, dont les composantes APN(z) et APP(z) sont soit de la forme APN(z)=A(z/β₁), APP(z)=A(z/β₂), soit de la forme :

les paires de coefficient β₁ ^F,p, β₂ ^F,p et β₁ ^B,p, β₂ ^B,P étant optimisables séparément avec 0≤β₁ ^F,p≤β₂ ^F,p≤1 et 0≤β₁ ^B ^, ^p≤β₂ ^B ^, ^p≤1.To determine the coefficients a _i ^{B, P} , the mixed decoder comprises an inverse filter 200 of transfer function A ^F (z) which filters the synthetic signal

(n) produced by the short-term synthesis filter 16 to produce a filtered synthetic signal

⁰ (n). A module 224 / B performs the analysis by linear prediction of this signal

⁰ (n) as described with reference to Figure 1 if q _B > 1, or in a single stage if q _B = 1. The LPC / B coefficients thus obtained are supplied to the synthesis filter 16 to define its second component for the next frame. They are also provided, as are the LPC / F coefficients at post-filter 17, whose components APN (z) and APP (z) are either of the form APN (z) = A (z / β ₁ ), APP (z) = A (z / β ₂ ), either of the form:

the pairs of coefficient β ₁ ^{F, p} , β ₂ ^{F, p} and β ₁ ^{B, p} , β ₂ ^{B, P} being optimizable separately with 0 ≤β ₁ ^{F, p} ≤β ₂ ^{F, p} ≤ 1 and 0≤β ₁ ^B ^, ^p ≤β ₂ ^B ^, ^p ≤ 1 .

Le décodeur local 232 prévu dans le codeur mixte consiste essentiellement en les éléments notés 10, 12, 14, 16, 200 et 224/B du décodeur de la figure 7. Outre les échantillons u du dictionnaire adaptatif et les états initiaux ŝ du filtre 36, le décodeur local 232 fournit les paramètres LPC/B qui sont utilisés, avec les paramètres LPC/F fournis par le module d'analyse 224/F, par le module 39 d'évaluation de la pondération perceptuelle et le module 40 de calcul des réponses impulsionnelles h et h'.The local decoder 232 provided in the mixed coder essentially consists of the elements denoted 10, 12, 14, 16, 200 and 224 / B of the decoder of FIG. 7. In addition to the samples u of the adaptive dictionary and the initial states ŝ of the filter 36 , the local decoder 232 supplies the LPC / B parameters which are used, with the LPC / F parameters supplied by the analysis module 224 / F, by the module 39 for evaluating the perceptual weighting and the module 40 for calculating the impulse responses h and h '.

La fonction de transfert du filtre de pondération perceptuelle 38 évaluée par le module 39 est soit de la forme W(z)=A(z/γ₁)/A(z/γ₂), soit de la forme

les paires de coefficients γ₁ ^F,p, γ₂ ^F,p, et γ₁ ^B,p, γ₂ ^B,p étant optimisables séparément avec 0≤γ₂ ^F,p≤γ₂ ^F,p≤1 et 0≤γ₂ ^B,p≤γ₁ ^B,p≤1.The transfer function of the perceptual weighting filter 38 evaluated by the module 39 is either of the form W (z) = A (z / γ ₁ ) / A (z / γ ₂ ), or of the form

the pairs of coefficients γ ₁ ^{F, p} , γ ₂ ^{F, p} , and γ ₁ ^{B, p} , γ ₂ ^{B, p} being optimizable separately with 0≤γ ₂ ^{F, p} ≤γ ₂ ^{F, p} ≤1 and 0≤ γ ₂ ^{B, p} ≤γ ₁ ^{B, p} ≤1.

Pour le reste, le fonctionnement du codeur mixte est identique à celui du codeur décrit en référence à la figure 4. Seuls les paramètres EXC, LTP et LPC/F sont envoyés vers le décodeur.For the rest, the operation of the mixed encoder is identical to that of the encoder described with reference to FIG. 4. Only the parameters EXC, LTP and LPC / F are sent to the decoder.

Claims

Method for linear prediction analysis of an audiofrequency signal (s ⁰ (n)), for determining spectral parameters dependent on a short-term spectrum of the audiofrequency signal, the method comprising q successive prediction stages (5 _p ), q being an integer greater than 1, characterized in that at each prediction stage p (1≤p≤q), parameters representing a number Mp, predefined for each stage p, of coefficients a _{1 are determined} ^p , ..., a _Mp ^p of linear prediction of an input signal from said stage, the audio frequency signal to be analyzed constituting the input signal (s ⁰ (n)) of the first stage, and the input signal (s ^p (n)) d '' a stage p + 1 being constituted by the input signal (s ^p-1 (n)) of stage p filtered by a transfer function filter

Analysis method according to claim 1, characterized in that the number Mp of linear prediction coefficients increases from one stage to the next.

Method for coding an audio signal, comprising the following steps: - analysis by linear prediction of the audio frequency signal (s (n)) digitized in successive frames to determine parameters (LPC) defining a short-term synthesis filter (16);

- determination of excitation parameters (k, β, LTP) defining an excitation signal (u (n)) to be applied to the short-term synthesis filter (16) to produce a representative synthetic signal (ŝ (n)) the audio signal; and

- production of quantification values of parameters defining the short-term synthesis filter and excitation parameters, characterized in that the analysis by linear prediction is a process with q successive stages (5 _p ), q being an integer greater than 1, said process comprising, at each prediction stage p (1≤p≤q), the determination of parameters representing a number Mp, predefined for each stage p, of coefficients a ₁ ^p , ..., a _Mp ^p for linear prediction of an input signal from said stage, the audio frequency signal to be coded (s (n)) constituting the input signal (s ⁰ (n)) of the first stage, and the input signal (s ^p (n)) of a stage p + 1 being constituted by the input signal (s ^p-1 (n)) of stage p filtered by a transfer function filter

the short-term synthesis filter (16) having a transfer function of the form 1 / A (z) with

Coding method according to claim 3, characterized in that the number Mp of linear prediction coefficients increases from one stage to the next.

Coding method according to claim 3 or 4, characterized in that at least some of the excitation parameters are determined by minimizing the energy of an error signal resulting from the filtering of the difference between the audio signal (s (n )) and the synthetic signal (ŝ (n)) by at least one perceptual weighting filter (38) whose transfer function is of the form W (z) = A (z / γ ₁ ) / A (z / γ ₂ ) where γ ₁ and γ ₂ denote spectral expansion coefficients such as 0≤γ ₂ ≤γ ₁ ≤ 1.

Coding method according to claim 3 or 4, characterized in that at least some of the excitation parameters are determined by minimizing the energy of an error signal resulting from the filtering of the difference between the audio signal (s (n )) and the synthetic signal (ŝ (n)) by at least one perceptual weighting filter (38) whose transfer function is of the form

where γ ₁ ^p , γ ₂ ^p denote pairs of spectral expansion coefficients such that 0 ≤ γ ₂ ^p ≤ γ ₁ ^p ≤ 1 for 1≤p≤q.

Method for decoding a bit stream to construct an audio frequency signal encoded by said bit stream, characterized in that: - parameters quantization values (LPC) defining a short-term synthesis filter (16) and excitation parameters (k, β, LTP) are received, the parameters defining the synthesis filter representing a larger number q that 1 of sets of linear prediction coefficients ( a _i ^p ), each set p comprising a predefined number Mp of coefficients;

- an excitation signal (u (n)) is produced on the basis of the quantization values of the excitation parameters; and

- producing a synthetic audiofrequency signal (ŝ (n)) by filtering the excitation signal by a synthesis filter (16) having a transfer function of the form 1 / A (z) with

where the coefficients a ₁ ^p , ..., a _Mp ^p correspond to the p-th game of linear prediction coefficients for 1≤p≤q.

Decoding method according to claim 7, characterized in that said synthetic audio frequency signal (ŝ (n)) is applied to a post-filter (17) whose transfer function (H _PF (z)) includes a term of the form A (z / β ₁ ) / A (z / β ₂ ), where β ₁ and β ₂ denote coefficients such as 0≤β ₁ ≤β ₂ ≤1.

Decoding method according to claim 7, characterized in that said synthetic audio frequency signal (ŝ (n)) is applied to a post-filter (17) whose transfer function (H _PF (z)) includes a term of the form

where β ₁ ^p , β ₂ ^p denote pairs of coefficients such as 0 ≤β ₁ ^p ≤β ₂ ^p ≤ 1 for 1≤p≤q, and A ^p (z) represents, for the p-th set of linear prediction coefficients, the function

Method for coding a first digital audio signal digitized in successive frames, comprising the following steps: - analysis by linear prediction of a second audio frequency signal (ŝ (n)) to determine parameters (LPC) defining a short-term synthesis filter (16);

- determination of excitation parameters (k, β, LTP) defining an excitation signal (u (n)) to be applied to the short-term synthesis filter (16) to produce a representative synthetic signal (ŝ (n)) the first audio signal, this synthetic signal constituting said second audio signal for at least one following frame; and

- production of quantification values of excitation parameters, characterized in that the analysis by linear prediction is a process with q successive stages (5 _p ), q being an integer greater than 1, said process comprising, at each prediction stage p (1≤p≤q), the determination of parameters representing a number Mp, predefined for each stage p, of coefficients a ₁ ^p , ..., a _Mp ^P for linear prediction of an input signal from said stage, the second audio frequency signal (ŝ (n)) constituting the input signal (s ⁰ (n)) of the first stage, and the input signal (s ^p (n)) of a stage p + 1 being constituted by the input signal (s ^p-1 (n)) of stage p filtered by a transfer function filter

Coding method according to claim 10, characterized in that the number Mp of linear prediction coefficients increases from one stage to the next.

Coding method according to claim 10 or 11, characterized in that at least some of the excitation parameters are determined by minimizing the energy of an error signal resulting from the filtering of the difference between the first audio frequency signal (s ( n)) and the synthetic signal (ŝ (n)) by at least one perceptual weighting filter (38) whose transfer function is of the form W (z) = A (z / γ ₁ ) / A (z / γ ₂ ) where γ ₁ and γ ₂ denote spectral expansion coefficients such that 0≤γ ₂ ≤γ ₁ ≤ 1.

Coding method according to claim 10 or 11, characterized in that at least some of the parameters of excitation are determined by minimizing the energy of an error signal resulting from the filtering of the difference between the first audio frequency signal (s (n)) and the synthetic signal (ŝ (n)) by at least one filter of perceptual weighting (38) whose transfer function is of the form

Method for decoding a bit stream to construct in successive frames an audio frequency signal coded by said bit stream, characterized in that: - quantization values of excitation parameters (k, β, LTP) are received;

- an excitation signal (u (n)) is produced on the basis of the quantization values of the excitation parameters;

- a synthetic audio frequency signal (ŝ (n)) is produced by filtering the excitation signal by a short-term synthesis filter (16);

an analysis is carried out by linear prediction of the synthetic signal (ŝ (n)) in order to obtain coefficients of the short-term synthesis filter (16) for at least one following frame, and in that the linear prediction analysis is a process with q successive stages (5 _p ), q being an integer greater than 1, said process comprising, at each prediction stage p (1≤p≤q), the determination of parameters representing a number Mp, predefined for each stage p, of coefficients a ₁ ^p , ..., a _Mp ^p of linear prediction of an input signal of said stage, the synthetic signal (ŝ (n)) constituting the input signal (s ⁰ (n)) of the first stage, and the input signal (s ^p (n)) of a stage p + 1 being constituted by the input signal (s ^p-1 (n )) of stage p filtered by a transfer function filter

Decoding method according to claim 14, characterized in that said synthetic audiofrequency signal (ŝ (n)) is applied to a post-filter (17) whose transfer function (H _PF (z)) includes a term of the form A (z / β ₁ ) / A (z / β ₂ ), where β ₁ and β ₂ denote coefficients such as 0≤β ₁ ≤β ₂ ≤1.

Decoding method according to claim 14, characterized in that said synthetic audiofrequency signal (ŝ (n)) is applied to a post-filter (17) whose transfer function (H _PF (z)) includes a term of the form

where β ₁ ^p , β ₂ ^p denote pairs of coefficients such as 0≤ β ₁ ^p ≤ β ₂ ^p ≤ 1 for 1≤p≤q.

Method for coding a first digital audio signal digitized in successive frames, characterized in that it comprises the following steps: - analysis by linear prediction of the first audio frequency signal (s (n)) to determine parameters (LPC / F) defining a first component of a short-term synthesis filter (16);

- determination of excitation parameters (k, β, LTP) defining an excitation signal (u (n)) to be applied to short term synthesis filter (16) for producing a synthetic signal (ŝ (n)) representative of the first audio signal;

production of quantization values of the parameters defining the first component of the short-term synthesis filter and of the excitation parameters;

filtering of the synthetic signal (ŝ (n)) by a transfer function filter corresponding to the inverse of the transfer function of the first component of the short-term synthesis filter; and

- analysis by linear prediction of the filtered synthetic signal ((ŝ ⁰ (n)) to obtain coefficients of a second component of the short-term synthesis filter for at least one following frame, in that the linear prediction analysis of the first audio frequency signal (s (n)) is a process with q _F successive stages (5 _p ), q _F being an integer at least equal to 1, said process with q _F stages comprising , at each prediction stage p (1≤p≤q _F ), the determination of parameters representing a number MFp, predefined for each stage p, of coefficients a ₁ ^{F, p} , ..., a _MFp ^F ^, ^p for linear prediction of an input signal from said stage, the first audio frequency signal (s (n)) constituting the input signal (s ⁰ (n)) from the first stage of the process with q _F stages, and the input signal (s ^p (n)) of a stage p + 1 of the process with q _F stages being constituted by the input signal (s ^p-1 (n)) of stage p of the process with q _F stages filtered by a transfer function filter

the first component of the short-term synthesis filter (16) having a transfer function of the form 1 / A ^F (z) with

and in that the analysis by linear prediction of the filtered synthetic signal is a process with q _B successive stages (5 _p ), q _B being an integer at least equal to 1, said process with q _B stages comprising, on each stage of prediction p (1≤p≤q _B ), the determination of parameters representing a number MBp, predefined for each stage p, of coefficients a ₁ ^{B, p} , ..., a _MBp ^B ^, ^p for linear prediction of an input signal from said stage, the filtered synthetic signal (ŝ ⁰ (n)) constituting the input signal (s ⁰ (n)) of the first stage of the process with q _B stages, and the input signal (s ^p (n)) of a stage p + 1 of the process with q _B stages being constituted by the input signal (s ^p-1 (n)) of the stage p of the process at q _B stages filtered by a transfer function filter

the second component of the short-term synthesis filter (16) having a transfer function of the form 1 / A ^B (z) with

and the short-term synthesis filter (16) having a transfer function of the form 1 / A (z) with A (z) = A ^F (z) .A ^B (z).

Coding method according to claim 17, characterized in that at least some of the excitation parameters are determined by minimizing the energy of an error signal resulting from the filtering of the difference between the first audio signal (s (n) ) and the synthetic signal (ŝ (n)) by at least one perceptual weighting filter (38) whose transfer function is of the form W (z) = A (z / γ ₁ ) / A (z / γ ₂ ) where γ ₁ and γ ₂ denote spectral expansion coefficients such that 0≤γ ₂ ≤γ ₁ ≤ 1.

Coding method according to claim 17, characterized in that at least some of the excitation parameters are determined by minimizing the energy of an error signal resulting from the filtering of the difference between the first audio frequency signal (s (n)) and the synthetic signal (ŝ ( n)) by at least one perceptual weighting filter (38) whose transfer function is of the form

where γ ₁ ^{F, p} , γ ₂ ^{F, p} denote pairs of spectral expansion coefficients such that 0 ≤ γ ₂ ^{F, p} ≤ γ ₁ ^{F, p} ≤ 1 for 1≤p≤q _F , and γ ₁ ^{B, p} , γ ₂ ^B, ^p denote pairs of spectral expansion coefficients such that 0 ≤ γ ₂ ^B, ^p ≤ γ ₁ ^B, ^p ≤ 1 for 1≤p≤q _B.

Method for decoding a bit stream to construct in successive frames an audio frequency signal coded by said bit stream, characterized in that: - parameters quantization values (LPC / F) defining a first component of a short-term synthesis filter (16) and excitation parameters (k, β, LTP) are received, the parameters defining the first component of the short-term synthesis filter representing a number q _F at least equal to 1 of sets of linear prediction coefficients a ₁ ^{F, p} , ..., a _MFp ^{F, p} for 1≤p≤q _F , each set p comprising a predefined number MFp of coefficients, the first component of the short-term synthesis filter (16) having a transfer function of the form 1 / A ^F (z) with

- a synthetic audio frequency signal (ŝ (n)) is produced by filtering the excitation signal by a short-term synthesis filter (16) with transfer function 1 / A (z) with A (z) = A ^F ( z) .A ^B (z), 1 / A ^B (z) representing the transfer function of a second component of the short-term synthesis filter (16);

- the synthetic signal (ŝ (n)) is filtered by a transfer function filter A ^F (z); and

an analysis is carried out by linear prediction of the filtered synthetic signal (ŝ ⁰ (n)) to obtain coefficients of the second component of the short-term synthesis filter (16) for at least one following frame, and in that the linear prediction analysis of the filtered synthetic signal is a process with q _B successive stages (5 _p ), q _B being an integer at least equal to 1, said process comprising, at each prediction stage p (1 ≤p≤q _B ), the determination of parameters representing a number MBp, predefined for each stage p, of coefficients a ₁ ^{B, p} , ..., a _MBp ^{B, p} of linear prediction of an input signal of said stage, the filtered synthetic signal (ŝ ⁰ (n)) constituting the input signal (s ⁰ (n)) of the first stage, and the input signal (s ^p (n)) of a stage p + 1 being constituted by the input signal (s ^p-1 (n)) of stage p filtered by a transfer function filter

Decoding method according to claim 20, characterized in that said synthetic audiofrequency signal (ŝ (n)) is applied to a post-filter (17) whose transfer function (H _PF (z)) includes a term of the form A (z / β ₁ ) / A (z / β ₂ ), where β ₁ and β ₂ denote coefficients such as 0≤β ₁ ≤β ₂ ≤1.

Decoding method according to claim 20, characterized in that said synthetic audiofrequency signal (ŝ (n)) is applied to a post-filter (17) whose transfer function (H _PF (z)) includes a term of the form

where β ₁ ^{F, p} , β ₂ ^{F, p} denote pairs of coefficients such that 0 ≤β ₁ ^{F, p} ≤β ₂ ^{F, p} ≤ 1 for 1≤p≤q _F , and β ₁ ^{B, p} , β ₂ ^{B, p} denote pairs of coefficients such as 0 ≤β ₁ ^{B, p} ≤β ₂ ^{B, p} ≤1 for 1≤p≤q _B.