EP0734013B1

EP0734013B1 - Determination of an excitation vector in a CELP coder

Info

Publication number: EP0734013B1
Application number: EP96410028A
Authority: EP
Inventors: Mustapha Bouraoui
Original assignee: STMicroelectronics SA; SGS Thomson Microelectronics SA
Current assignee: STMicroelectronics SA
Priority date: 1995-03-24
Filing date: 1996-03-19
Publication date: 2001-08-22
Anticipated expiration: 2016-03-19
Also published as: US5719994A; JPH0990996A; DE69614594D1; EP0734013A3; FR2732148B1; EP0734013A2; FR2732148A1

Description

La présente invention concerne la compression des signaux de parole à transmettre sur une ligne téléphonique et plus particulièrement la détermination d'un vecteur d'excitation dans le cadre d'une compression selon la méthode de la Prédiction Linéaire Excitée par Code (CELP).The present invention relates to the compression of speech signals to be transmitted over a telephone line and more particularly the determination of an excitation vector as part of a compression using the Prediction method Linear Excited by Code (CELP).

La figure 1 représente très schématiquement un circuit de compression CELP. Un tel circuit est basé sur une modélisation des cordes vocales et de la chambre de résonance constituée par les cavités buccale, de la gorge et du larynx. Une telle méthode de compression est donc optimisée pour le traitement des signaux de parole.Figure 1 very schematically represents a circuit CELP compression. Such a circuit is based on modeling vocal cords and the built-up resonance chamber through the oral cavity, throat and larynx. Such a compression method is therefore optimized for the treatment of speech signals.

Les cavités buccale, de la gorge et du larynx sont modélisées par un filtre de "prédiction linéaire" 10 dont la fonction de transfert comporte en général dix pôles. Les cordes vocales sont modélisées par une excitation E traitée par un filtre en peigne 12.The oral, throat and larynx cavities are modeled by a "linear prediction" filter 10 whose transfer function generally has ten poles. The ropes are modeled by an excitation E treated by a comb filter 12.

Un signal de parole numérisé S est analysé par trames par un circuit d'analyse 14. A chaque trame, le circuit d'analyse 14 détermine des coefficients a₁ à a₁₀ de la fonction de transfert du filtre 10, le pas p du filtre en peigne 12, et un gain G appliqué à l'excitation E en 16 à l'entrée du filtre 12. Les valeurs a_i, p et G sont calculées à chaque trame pour tenir compte respectivement des variations de la cavité buccale, du spectre de fréquence des cordes vocales et de l'amplitude du son. On tente de cette manière d'obtenir que la sortie du filtre 10 soit égale au signal S. Alors, au lieu de transmettre les échantillons du signal S, on transmet les coefficients a_i, p, et G afin qu'un décodeur qui reçoit ces coefficients reconstitue les trames correspondantes du signal S.A digitized speech signal S is analyzed in frames by an analysis circuit 14. At each frame, the analysis circuit 14 determines coefficients a ₁ to a ₁₀ of the transfer function of the filter 10, the step p of the filter in comb 12, and a gain G applied to the excitation E at 16 at the input of the filter 12. The values a _i , p and G are calculated for each frame to take account respectively of the variations of the oral cavity, of the spectrum frequency of the vocal cords and the amplitude of the sound. In this way an attempt is made to obtain that the output of the filter 10 is equal to the signal S. Then, instead of transmitting the samples of the signal S, the coefficients a _i , p, and G are transmitted so that a decoder which receives these coefficients reconstruct the corresponding frames of the signal S.

Bien entendu, le décodeur doit également connaítre l'excitation E à utiliser. La détermination des coefficients ai, p et G ne pose pas de problème particulier. Toutefois, la procédure de recherche de l'excitation optimale reste la plus lourde en termes de charge de calcul et il est toujours d'un grand intérêt de la simplifier même au prix d'une sensible réduction de la qualité de la compression.Of course, the decoder must also know the excitation E to use. The determination of the coefficients ai, p and G does not pose any particular problem. However, the procedure finding the optimal excitement remains the heaviest in terms of computational load and it's still a big interest to simplify it even at the cost of a significant reduction the quality of the compression.

A l'origine du codage CELP, l'excitation E était sélectionnée dans une table 18 (appelée "codebook") contenant plusieurs excitations possibles qui représentaient, en fait, des morceaux de bruit blanc. Dans ce cas, un circuit de commande 20 balaye la table 18 jusqu'à ce que la différence e, formée en 22, entre la trame courante du signal S et la trame correspondante en sortie du filtre 10 soit minimale. (Bien entendu, au lieu de comparer le signal S à la sortie du filtre 10, on peut également comparer l'excitation E à la trame du signal S ayant subi le traitement inverse des filtres 10 et 12.)At the origin of the CELP coding, the excitation E was selected in a table 18 (called "codebook") containing several possible excitations which in fact represented pieces of white noise. In this case, a control circuit 20 scan table 18 until the difference e, formed at 22, between the current frame of signal S and the corresponding frame at the output of the filter 10 is minimal. (Of course, instead of compare the signal S to the output of the filter 10, we can also compare the excitation E to the frame of the signal S having undergone the reverse processing of filters 10 and 12.)

Avec cette technique, outre les coefficients ai, p et G, on fournit l'adresse C sélectionnant la meilleure excitation E dans la table 18 à un décodeur muni d'une table homologue.With this technique, in addition to the coefficients ai, p and G, we provide the address C selecting the best excitation E in table 18 to a decoder provided with a peer table.

Chaque excitation contenue dans la table 18 est une séquence d'échantillons numériques correspondant respectivement aux échantillons de chacune des trames du signal à comprimer. Pour que la compression soit de qualité acceptable, il est nécessaire de stocker un nombre relativement important, de l'ordre de 1000, de séquences d'excitation. Each excitation contained in table 18 is a sequence of corresponding digital samples respectively to the samples of each of the frames of the signal to be compressed. For compression to be of acceptable quality, it is necessary to store a relatively large number of around 1000, excitation sequences.

Pour limiter la complexité de la procédure de recherche, on a proposé que chaque échantillon d'une séquence d'excitation ne puisse prendre que trois valeurs, à savoir, 0, 1 ou -1 (séquence d'excitation ternaire). On s'est aperçu que cela ne modifiait pas la qualité de la compression de manière perceptible.To limit the complexity of the search procedure, it has been proposed that each sample of an excitation sequence can only take three values, namely, 0, 1 or -1 (ternary excitation sequence). We realized that not noticeably change the quality of the compression.

La figure 2 représente un exemple de séquence d'excitation E que l'on a proposé pour réduire davantage la complexité de la recherche. Cette séquence d'excitation est dite binaire. Elle comprend plusieurs échantillons non-nuls de valeurs 1 et -1, deux échantillons non-nuls, ou impulsions, étant séparés par un nombre constant d'échantillons nuls, ici 3. Une telle séquence d'excitation peut être représentée par un nombre binaire (ou code d'excitation) C dont les bits sont associés aux impulsions et correspondent à la polarité de celles-ci. En procédant ainsi, le code C fourni par le circuit de commande 20 correspond directement à une séquence d'excitation et la table 18 est supprimée. La complexité est par ailleurs réduite du fait que les échantillons à prendre en compte se réduisent aux impulsions, dont le nombre est, dans l'exemple de la figure 2, quatre fois inférieur au nombre total d'échantillons d'une séquence. En outre, la structure des filtres 10 et 12 est simplifiée.Figure 2 shows an example of excitation sequence E which has been proposed to further reduce the complexity of research. This excitation sequence is called binary. It includes several non-zero samples with values 1 and -1, two non-zero samples, or pulses, being separated by a constant number of null samples, here 3. Such a sequence excitation can be represented by a binary number (or excitation code) C whose bits are associated with the pulses and correspond to the polarity of these. By proceeding thus, the code C supplied by the control circuit 20 corresponds directly to an excitation sequence and table 18 is deleted. The complexity is also reduced by the fact that the samples to be taken into account are reduced to pulses, whose number is, in the example of figure 2, four times less than the total number of samples in a sequence. In in addition, the structure of filters 10 and 12 is simplified.

Cette technique dégrade très légèrement la qualité de la compression, mais cette dégradation est compensée de manière simple par un traitement destiné à supprimer les effets de la régularité de l'espacement des échantillons non-nuls.This technique slightly degrades the quality of compression, but this degradation is compensated so simple treatment to suppress the effects of regularity of spacing of non-zero samples.

A chaque code C est associé un vecteur d'excitation C ayant pour composantes les valeurs 1 et -1 correspondant aux bits à 0 et 1 du code C. Les termes "vecteur" et "code" sont confondus dans la suite de la description.Each C code is associated with an excitation vector C having as components the values 1 and -1 corresponding to the bits at 0 and 1 of code C. The terms "vector" and "code" are confused in the following description.

Afin de réduire davantage le nombre d'essais pour minimiser l'erreur, on a proposé de limiter le nombre de vecteurs ou codes d'excitation possibles à un sous-ensemble représentatif d'un ensemble plus grand. L'article intitulé "A Comparison of some Algebraic Structures for CELP Coding of Speech" de J.P. Adoul et C. Lamblin dans Proc. ICASSP, 1987, décrit une telle méthode. Pour créer un sous-ensemble représentatif de tous les codes C de N bits, on forme l'ensemble des valeurs de n bits (n<N), dont chacune est complétée par N-n bits de correction d'erreur.To further reduce the number of trials for minimize the error, we proposed to limit the number of vectors or possible excitation codes to a representative subset of a larger whole. The article entitled "A Comparison of some Algebraic Structures for CELP Coding of Speech "by J.P. Adoul and C. Lamblin in Proc. ICASSP, 1987, describes such a method. To create a representative subset of all C codes of N bits, we form the set of values of n bits (n <N), each of which is supplemented by N-n bits error correction.

Pour trouver le meilleur vecteur d'excitation C, on cherche généralement à maximiser un critère de sélection défini par : m = scal2(T, Ci)/mod2(FCi), où C_i est le vecteur d'excitation essayé ; où T est un vecteur cible formé par des échantillons de la trame en cours d'analyse du signal S ayant subi le traitement inverse des filtres 10 et 12, ces échantillons étant ceux correspondant aux valeurs 1 et -1 du vecteur C_i ; et où F désigne la matrice représentant la fonction de transfert des filtres 10 et 12, dans laquelle on n'a conservé que les rangées correspondant aux valeurs 1 et -1 du vecteur C_i. Les notations scal(.,.) et mod(.,.) désignent respectivement le produit scalaire et le module.To find the best excitation vector C, we generally seek to maximize a selection criterion defined by: m = scal 2 (T, C i ) / mod 2 (FC i ) where C _i is the excitation vector tested; where T is a target vector formed by samples of the frame being analyzed of the signal S having undergone the reverse processing of filters 10 and 12, these samples being those corresponding to the values 1 and -1 of the vector C _i ; and where F denotes the matrix representing the transfer function of filters 10 and 12, in which only the rows corresponding to the values 1 and -1 of the vector C _i have been kept. The notations scal (.,.) And mod (.,.) Denote the scalar product and the module respectively.

L'essai de tous les vecteurs d'excitation C_i selon ce critère représente un nombre de calculs considérable à effectuer entre les arrivées de deux trames du signal S.The test of all the excitation vectors C _i according to this criterion represents a considerable number of calculations to be carried out between the arrivals of two frames of the signal S.

Il a été constaté que le dénominateur du critère m est approximativement constant, quel que soit le vecteur d'excitation C_i. Ainsi, le critère m est approximativement maximisé en maximisant le numérateur. Ce numérateur est maximisé lorsque chaque composante du vecteur d'excitation C_i est celle de même signe que l'échantillon correspondant du vecteur cible T. En d'autres termes, un code d'excitation optimal approximatif est obtenu directement en prenant pour ses bits les bits de signe des échantillons du vecteur cible (ou les inverses des bits de signe).It was found that the denominator of the criterion m is approximately constant, whatever the excitation vector C _i . Thus, the criterion m is approximately maximized by maximizing the numerator. This numerator is maximized when each component of the excitation vector C _i is that of the same sign as the corresponding sample of the target vector T. In other words, an approximate optimal excitation code is obtained directly by taking for its bits the sign bits of the target vector samples (or the inverse of the sign bits).

Cette solution n'est pas applicable dans le cas où les codes d'excitation utilisables sont limités à un sous-ensemble représentatif d'un ensemble plus grand obtenu, par exemple, à l'aide d'un code correcteur d'erreurs.This solution is not applicable in the case where the usable excitation codes are limited to a subset representative of a larger set obtained, for example, at using an error correction code.

Un objet de la présente invention est de prévoir un procédé permettant de limiter le nombre de calculs nécessaires pour maximiser le critère m susmentionné dans le cas où les codes d'excitation utilisables appartiennent à un sous-ensemble représentatif d'un ensemble plus grand.An object of the present invention is to provide a method for limiting the number of calculations required to maximize the aforementioned criterion m in the case where the usable excitation codes belong to a subset representative of a larger whole.

Pour atteindre cet objet, la présente invention prévoit un procédé de détermination d'un vecteur d'excitation associé à une trame d'un signal de parole à comprimer conformément à la revendication 1. To achieve this object, the present invention provides a method for determining an excitation vector associated with a frame of a speech signal to be compressed according to claim 1.

Selon un mode de réalisation de la présente invention, les bits de correction d'erreurs sont des bits d'un code correcteur de Hamming.According to an embodiment of the present invention, the error correction bits are bits of a correction code from Hamming.

Ces objets, caractéristiques et avantages ainsi que d'autres de la présente invention seront exposés en détail dans la description suivante de modes de réalisation particuliers, faite à titre non-limitatif à l'aide des figures jointes parmi lesquelles :

la figure 1, précédemment décrite, illustre une méthode de compression CELP ;

la figure 2, précédemment décrite, représente un exemple de séquence d'excitation et du code correspondant ; et

la figure 3 illustre des traitements à effectuer selon la présente invention pour sélectionner un vecteur d'excitation optimal dans le cas où celui-ci appartient à un sous-ensemble obtenu en utilisant un code correcteur d'erreurs.

These objects, characteristics and advantages as well as others of the present invention will be explained in detail in the following description of particular embodiments, given without limitation by means of the attached figures, among which:

Figure 1, previously described, illustrates a CELP compression method;

FIG. 2, previously described, shows an example of excitation sequence and the corresponding code; and

FIG. 3 illustrates the processing operations to be carried out according to the present invention for selecting an optimal excitation vector in the case where it belongs to a subset obtained using an error correcting code.

Pour maximiser le critère m susmentionné, on a constaté que le dénominateur de ce critère, le carré du module du vecteur FC_i, est approximativement constant, quel que soit le vecteur d'excitation C_i. Cette approximation est relativement bonne puisque le module du vecteur C_i est constant. Ainsi, pour maximiser le critère m de manière approchée, il suffit de maximiser un critère simplifié qui est le produit scalaire du vecteur cible T par le vecteur d'excitation C_i. Ce produit scalaire est maximal lorsque chaque composante (1 ou -1) du vecteur C_i est celle de même signe que l'échantillon correspondant du vecteur cible T. On obtient ainsi directement un vecteur d'excitation optimal approché Copt à partir du vecteur cible T.To maximize the above-mentioned criterion m, it has been found that the denominator of this criterion, the square of the module of the vector FC _i , is approximately constant, whatever the excitation vector C _i . This approximation is relatively good since the modulus of the vector C _i is constant. Thus, in order to maximize the criterion m, it suffices to maximize a simplified criterion which is the scalar product of the target vector T by the excitation vector C _i . This scalar product is maximum when each component (1 or -1) of the vector C _i is that of the same sign as the corresponding sample of the target vector T. We thus obtain directly an optimal excitation vector approximated Copt from the target vector T.

Ce procédé n'est pas directement applicable dans le cas où les codes d'excitation possibles appartiennent à un sous-ensemble représentatif d'un ensemble plus grand, par exemple lorsque ce sous-ensemble est formé à partir de valeurs de n bits auxquels on ajoute N-n bits d'un code correcteur d'erreurs. En effet, il est alors très probable que le code d'excitation trouvé n'appartienne pas au sous-ensemble. Dans ce cas, on pourrait envisager de ramener le code d'excitation trouvé à un code d'excitation appartenant au sous-ensemble en lui appliquant une fonction de correction d'erreur associée au code correcteur. On tombe alors sur le code d'excitation du sous-ensemble qui est le plus proche du code d'excitation trouvé. Cette "correction d'erreur" a pour conséquence de modifier au moins un bit du code d'excitation, ce bit pouvant avoir dans certains cas une grande influence sur la valeur du critère m, de manière que le code d'excitation final fournisse de mauvais résultats.This process is not directly applicable in the cases where the possible excitation codes belong to a subset representative of a larger whole, for example when this subset is formed from values of n bits to which we add N-n bits of an error correcting code. In effect then it is very likely that the excitation code found does not belong to the subset. In this case, we might consider reducing the excitation code found to a excitation code belonging to the subset by applying it an error correction function associated with the correction code. We then fall on the excitation code of the subset which is closest to the excitation code found. This "error correction" has the effect of modifying at least one bit of the excitation code, this bit can have in some cases a great influence on the value of the criterion m, so that the final excitation code gives poor results.

A titre d'exemple, on utilise ci-après un code correcteur de Hamming noté H(N, n, 3), où 3 désigne la distance minimale de Hamming séparant deux éléments appartenant au sous-ensemble représentatif. La distance de Hamming entre deux valeurs est définie comme étant le nombre de différences bit à bit entre ces deux valeurs. Avec cette solution, on crée un sous-ensemble de 2ⁿ vecteurs d'excitation de N bits.By way of example, a Hamming correcting code denoted H (N, n, 3) is used below, where 3 denotes the minimum Hamming distance separating two elements belonging to the representative subset. The Hamming distance between two values is defined as the number of bitwise differences between these two values. With this solution, we create a subset of 2 ⁿ excitation vectors of N bits.

Selon l'invention, on forme un groupe de codes d'excitation comprenant un code initial trouvé en maximisant le critère m simplifié ainsi que tous les autres codes obtenus à partir de celui-ci en modifiant un seul bit. Ceci a pour conséquence, en utilisant un code correcteur de Hamming permettant de corriger un bit unique (distance de Hamming minimale 3), que chacun des codes d'excitation du groupe est proche d'un code distinct du sous-ensemble utilisable. Ensuite, on applique à chacun des codes du groupe la fonction de correction d'erreur de Hamming, ce qui ramène chacun des codes du groupe au code le plus proche du sous-ensemble. On obtient un groupe de codes "corrigés" appartenant au sous-ensemble et qui "entoure" le code initialement trouvé. Parmi les codes corrigés, on retient comme code optimal approché celui qui maximise le critère m complet, c'est-à-dire en calculant son numérateur et son dénominateur.According to the invention, a group of excitation codes is formed including an initial code found by maximizing the simplified criterion m as well as all the other codes obtained at from there by changing a single bit. This has the consequence, using a Hamming correcting code to correct a single bit (minimum Hamming distance 3), that each of the group's excitation codes is close to a code distinct from the usable subset. Then we apply to each of the group codes the error correction function of Hamming, which brings each of the group's codes back to the code closer to the subset. We get a group of codes "corrected" belonging to the subset and which "surrounds" the code originally found. Among the corrected codes, we retain as approximate optimal code that which maximizes the complete criterion m, that is, by calculating its numerator and denominator.

La figure 3 représente sous forme de schéma le procédé selon l'invention qui vient d'être décrit. La trame analysée du signal S subit en 24 le traitement inverse de celui des filtres 10 et 12 de la figure 1. On obtient ainsi un vecteur cible T, dont on ne garde que les échantillons correspondant aux impulsions des séquences d'excitation. En 26, on ne retient des échantillons du vecteur T que les bits de signe (ou leurs inverses) pour fournir un code d'excitation initial C₀. Ce code C₀ est "corrompu" en 28 pour former un groupe de codes comprenant le code C₀ et tous les autres codes C₁ à C_N obtenus en modifiant un seul bit du code C₀. Chaque code C₀ à C_N subit en 30 une "correction d'erreur" pour fournir un groupe de codes corrigés C'₀ à C'_N. En 32, chacun des vecteurs associés aux codes corrigés est comparé au vecteur cible T, et on retient comme code d'excitation optimal approché Copt celui associé au vecteur qui maximise le critère m complet.FIG. 3 represents in diagram form the method according to the invention which has just been described. The analyzed frame of the signal S undergoes the reverse processing at 24 of that of the filters 10 and 12 of FIG. 1. A target vector T is thus obtained, of which only the samples corresponding to the pulses of the excitation sequences are kept. In 26, samples of the vector T are retained only the sign bits (or their inverses) to provide an initial excitation code C ₀ . This code C ₀ is "corrupted" at 28 to form a group of codes comprising the code C ₀ and all the other codes C ₁ to C _N obtained by modifying a single bit of the code C ₀ . Each code C ₀ to C _N undergoes an "error correction" at 30 to provide a group of corrected codes C ' ₀ to C' _N. In 32, each of the vectors associated with the corrected codes is compared with the target vector T, and the optimal associated excitation code Copt is that associated with the vector which maximizes the complete criterion m.

Généralement, pour obtenir de meilleurs résultats, la position de la première impulsion des séquences d'excitation E est variable. Dans l'exemple de la figure 2, cette position peut être l'une des quatre premières, ce qui est déterminé par deux bits supplémentaires à transmettre au décodeur et qui multiplie par quatre le nombre de vecteurs d'excitation à essayer. Dans ce cas, on commence par former, pour chacune des quatre positions possibles, un vecteur cible et un vecteur d'excitation dont les composantes sont celles de même signe que celles du vecteur cible correspondant. On retient comme vecteur d'excitation optimal approché, parmi les quatre vecteurs ainsi obtenus, celui qui maximise le critère m complet.Generally, for best results, the position of the first pulse of the excitation sequences E is variable. In the example in Figure 2, this position can be one of the first four, which is determined by two additional bits to be transmitted to the decoder and which multiplies by four the number of excitation vectors to be tested. In this case, we start by forming, for each of the four positions possible, a target vector and an excitation vector whose components are those of the same sign as those of the vector corresponding target. We use as excitation vector optimal approached, among the four vectors thus obtained, that which maximizes the complete criterion m.

Claims

A method for determining an excitation vector (Copt) associated with a frame of a speech signal (S) to compress, said vector belonging to a subset associated with a larger set of excitation vectors likely to maximize a criterion (m), and having as components values 1 and -1 corresponding to a sequence of excitation samples (E) of a linear prediction filter (10), said criterion being equal to the square of the ratio between, on the one hand, the scalar product of the excitation vector by a target vector (T) formed by samples of the frame submitted to an inverse linear prediction filtering and, on the other hand, the module of the excitation vector submitted to a direct linear prediction filtering, the excitation vectors being associated with excitation codes having bits corresponding to the signs of the components of the excitation vectors, the excitation code subset associated with said vector subset being formed by binary values completed by error correction bits, any excitation code being associated with an excitation code of the subset through an error correction function, characterized in that it includes the following steps:

preselecting an excitation vector (C₀) having as components those with the same signs as the corresponding samples of the target vector, or those with the opposite signs; and

if the preselected excitation vector (C₀) does not belong to said subset, selecting as an excitation vector the vector which maximizes said criterion (m) among the vectors (C'₀... C'_N) of the subset, which are respectively obtained by submitting the error correction function the preselected associated with the preselected vector (C_o) and the group of codes (C₁... C_N) the closest therefrom, formed so that each of the closest codes differs from the preselected code by a single bit; and

selecting as the excitation code (Copt), among the corrected codes, the one associated with the vector which maximizes said criterion (m).
A method according to claim 1, wherein the error correction bits are the bits of a Hamming correcting code.