JP3130934B2

JP3130934B2 - Variable bit rate audio encoder

Info

Publication number: JP3130934B2
Application number: JP04508295A
Authority: JP
Inventors: エルクロスマン、アントニー; エストンプソン、エドムンド
Original assignee: ピクチュアテルコーポレイション
Priority date: 1991-03-05
Filing date: 1992-03-04
Publication date: 2001-01-31
Anticipated expiration: 2016-01-31
Also published as: EP0574523A4; DE69229627T2; CA2105387A1; EP0574523A1; WO1992015986A1; JPH06506330A; DE69229627D1; EP0574523B1

Description

【発明の詳細な説明】本発明は、1991年３月５日に出願された米国出願SN
07/664,579号の継続出願である、1991年３月６日出願の
米国出願SN 07/665,948号の一部継続出願である。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a U.S. application Ser.
This is a continuation-in-part of US application SN 07 / 665,948, filed March 6, 1991, which is a continuation application of 07 / 664,579.

発明の背景本発明は、一定の容量をもったチャンネル例えば電話
通信チャンネル上において、音声情報及び映像情報を伝
送することに関する。BACKGROUND OF THE INVENTION The present invention relates to the transmission of audio and video information over a channel of fixed capacity, such as a telephone communication channel.

ビデオ会議システムは、通常は、同一のチャンネル上
において音声情報と映像情報との両方を伝送する。チャ
ンネルの帯域幅の一部分は、通常は、音声情報によって
専用され、残りの帯域幅は、映像信号に割当てられる。Video conferencing systems typically transmit both audio and video information on the same channel. A portion of the channel bandwidth is usually dedicated to audio information, and the remaining bandwidth is allocated to video signals.

映像と音声の情報量は、時間と共に変化する。一例と
して、システムの一端にいる人は、ある時点では黙って
いるかも知れない。従って、システムが可変容量の音声
エンコーダを含むならば、この沈黙の時間中は、伝送す
る必要のある情報はほとんどない。The amount of video and audio information changes with time. As an example, a person at one end of the system may be silent at some point. Thus, if the system includes a variable capacity voice encoder, during this period of silence, very little information needs to be transmitted.

同様に、視野内の全ての物体が静止している場合のよ
うに、フレーム間において映像情報がほとんど又は全く
変わないことがありうる。システムが可変容量の映像エ
ンコーダをもつ場合は、これらの不活性の期間中は伝送
すべき情報はほとんどない。その反対の極限として、高
活性の期間内は、映像情報の量が、映像情報に割当てら
れたチャンネル容量を超過することがありうる。そのた
め、システムは、可及的に多量の映像情報を伝送し、残
りは廃棄する。Similarly, there may be little or no change in video information between frames, such as when all objects in the field of view are stationary. If the system has a variable capacity video encoder, there is little information to be transmitted during these inactive periods. At the other extreme, during periods of high activity, the amount of video information may exceed the channel capacity allocated to the video information. Therefore, the system transmits as much video information as possible and discards the rest.

典型的には、映像エンコーダは、映像信号の最も際立
った特徴に優先度を与える。従って、高優先度の情報が
最初に伝送され、目立たない低優先度の情報は、チャン
ネルが十分な容量をもたない場合、一時的に廃棄され
る。従って、可及的に大きな帯域幅が使用可能となるこ
とが望ましい。Typically, video encoders give priority to the most salient features of a video signal. Thus, high-priority information is transmitted first and unobtrusive low-priority information is temporarily discarded if the channel does not have sufficient capacity. It is therefore desirable to have as much bandwidth available as possible.

従って、本発明の目的は、伝送する必要のあるオーデ
ィオ情報がほとんどない場合にオーディオ情報に割当て
られるチャンネル帯域幅の量を減少させることにある。
帯域幅の残りの部分は、ビデオ情報に割当てる。従っ
て、平均して、オーディオ情報のためのビットレートは
低くし、映像情報のためのビットレートは高くする。Accordingly, it is an object of the present invention to reduce the amount of channel bandwidth allocated to audio information when there is little audio information that needs to be transmitted.
The rest of the bandwidth is allocated to video information. Thus, on average, the bit rate for audio information is low and the bit rate for video information is high.

発明の概要本発明は、ディジタル信号のサンプルを伝送するため
に用いられる伝送ビットを割当てる方法及び装置に関す
る。ディジタル信号の複数のサンプルから成るフレーム
について、集合的（アグリゲート）許容量子化歪み誤差
を表わす合計の許容量子化歪み値を選定する。複数のサ
ンプルから成るフレームについて、複数の選択されたフ
レームがノイズ閾値よりも大きくなるように、１組のサ
ンプルを選択する。１組の各々のサンプルについて、そ
のサンプルの許容量子化歪み誤差を表わすサンプル量子
化歪み値を計算する。全部のサンプル量子化歪み値の和
は、集合的許容量子化歪み値にほぼ等しい。１組の各々
のサンプルについて、サンプルの対応する量子化歪み値
にほぼ等しい量子化歪み誤差を与える量子化ステップサ
イズを選定する。次に、各々のサンプルを、その量子化
ステップサイズを用いて量子化する。SUMMARY OF THE INVENTION The present invention relates to a method and apparatus for allocating transmission bits used to transmit samples of a digital signal. For a frame consisting of a plurality of samples of the digital signal, a total permissible quantization distortion value representing an aggregate permissible quantization distortion error is selected. For a frame of multiple samples, a set of samples is selected such that the selected frames are greater than a noise threshold. For each sample in the set, a sample quantization distortion value representing an allowable quantization distortion error for that sample is calculated. The sum of all sample quantization distortion values is approximately equal to the collective allowable quantization distortion value. For each sample in the set, select a quantization step size that gives a quantization distortion error approximately equal to the corresponding quantization distortion value of the sample. Next, each sample is quantized using the quantization step size.

好ましい実施態様によれば、ディジタル信号は、ノイ
ズ成分と信号成分とを含む。ノイズ成分の値に対する信
号成分の値の比を表わす信号指数を、フレームの少くと
も１つのサンプルについて作成する。その信号指数に基
づいて、集合的許容量子化歪みを選定する。According to a preferred embodiment, the digital signal includes a noise component and a signal component. A signal index representing the ratio of the value of the signal component to the value of the noise component is created for at least one sample of the frame. Based on the signal index, a collective allowable quantization distortion is selected.

サンプル量子化歪み値は、集合的許容量子化歪み値
を、フレーム中のサンプルの数で除算し、第１サンプル
歪み値を形成することによって計算される。ディジタル
信号の複数のサンプルから成る試行的な組を、この組の
各々のサンプルが第１サンプル歪み値の値によって少く
とも部分的に定まるノイズ閾値よりも大きくなるように
選定する。第１サンプル歪み値は、第１サンプル歪み値
と、試行的な組から除かれた少くとも１つのサンプル
（即ち“ノイジーサンプル”）との間の差によって定ま
る量だけ調整される。調整された歪み値に基づいて、試
行的な組のノイジー（雑音を有する）サンプルを特定
し、ノイジーサンプルを（もしあったら）、試行的な組
から除去し、第１サンプル歪み値とノイジーサンプルと
の間の差によって定まる量だけ第１サンプル歪み値を再
調整する各工程が反復される。試行的な組の追加のノイ
ジーサンプルが見出されなくなるような調整された第１
サンプル歪み値に到達するか又は工程がある最大の回数
反復されるまで、工程が反復される。The sample quantization distortion value is calculated by dividing the collective allowable quantization distortion value by the number of samples in the frame to form a first sample distortion value. A trial set of samples of the digital signal is selected such that each sample of the set is greater than a noise threshold determined at least in part by the value of the first sample distortion value. The first sample distortion value is adjusted by an amount determined by the difference between the first sample distortion value and at least one sample removed from the trial set (ie, a "noisy sample"). A trial set of noisy (noisy) samples is identified based on the adjusted distortion values, the noisy samples (if any) are removed from the trial set, and a first sample distortion value and a noisy sample are determined. The steps of re-adjusting the first sample distortion value by an amount determined by the difference between are repeated. An adjusted first such that no additional noisy samples in the trial set are found.
The process is repeated until the sample distortion value is reached or the process is repeated a maximum number of times.

調整が終了した後、試行的な組の全部のサンプルを伝
送するのに必要なビット数が推定される。推定されたビ
ット数は、最大ビット数と比較される。推定ビット数が
最大ビット数より小さいか又はこれに等しい場合に、最
終的なノイズ閾値が、調整された第１サンプル歪み値に
基づいて選定される。After the adjustment is completed, the number of bits required to transmit all the samples in the trial set is estimated. The estimated number of bits is compared to the maximum number of bits. If the estimated number of bits is less than or equal to the maximum number of bits, a final noise threshold is selected based on the adjusted first sample distortion value.

推定されたビット数が最大ビット数を超過したら、第
２サンプル歪み値が作成される。ディジタル信号の複数
のサンプルから成る第２の試行的な組を、この組の複数
のサンプルの各々が第２サンプル歪み値よりも大きな値
をもつように選定する第２の試行的な組の全サンプルを
伝送するために必要なビット数が次に推定される。推定
されたビット数は、最大ビット数と次に比較される。推
定されたビット数が最大ビット数よりも大きければ、第
２サンプル歪み値を増大させ、この調整された第２サン
プル歪み値に基づいて、複数のサンプルから成る第２の
試行的な組を再選択する。第２の試行的な組を伝送する
のに必要なビットの数を再び推定する。推定されたビッ
トの数が最大ビット数よりも小さいか又はこれに等しく
なる第２サンプル歪み値に到達するまで、この工程が反
復される。If the estimated number of bits exceeds the maximum number of bits, a second sample distortion value is created. A second trial set of all samples of the digital signal is selected such that each of the plurality of samples of the set has a value greater than the second sample distortion value. The number of bits needed to transmit the sample is then estimated. The estimated number of bits is then compared to the maximum number of bits. If the estimated number of bits is greater than the maximum number of bits, the second sample distortion value is increased, and a second trial set of samples is re-created based on the adjusted second sample distortion value. select. Re-estimate the number of bits needed to transmit the second trial set. This process is repeated until the estimated number of bits reaches a second sample distortion value that is less than or equal to the maximum number of bits.

次に、調整された第１サンプル歪み値及び第２サンプ
ル歪み値から、サンプル歪み値が計算される。ディジタ
ル信号の複数のサンプルから成る最終的な組が、この最
終的な組の複数のサンプルの各々がサンプルの対応する
サンプル歪み値によって定まる最終的な閾値より大きな
値をもつように選択される。Next, a sample distortion value is calculated from the adjusted first and second sample distortion values. A final set of samples of the digital signal is selected such that each of the final set of samples has a value greater than a final threshold determined by the corresponding sample distortion value of the sample.

本発明は、別の視点によれば、ノイズ成分と信号成分
とを含むディジタル信号の通信方法及び装置に関する。
ディジタル信号が表わすディジタル信号よりもサンプル
数の少い推定信号が作成される。ノイズ成分の値に対す
る信号成分の値の比を推定信号の少くとも１つのサンプ
ルについて表わす信号指数を作成する。信号成分が十分
に大きなディジタル信号の複数のサンプルが信号指数に
基づいて作成される。ディジタル信号の選択された複数
のサンプル及びディジタル推定信号のサンプルは、いず
れも遠隔装置に伝送される。遠隔装置は、選択され伝送
された複数のサンプル及び推定されたサンプルから、デ
ィジタル信号を再構成する。According to another aspect, the present invention relates to a method and apparatus for communicating a digital signal including a noise component and a signal component.
An estimated signal having fewer samples than the digital signal represented by the digital signal is created. A signal index is created that represents the ratio of the value of the signal component to the value of the noise component for at least one sample of the estimated signal. A plurality of samples of the digital signal having a sufficiently large signal component are created based on the signal index. The selected plurality of samples of the digital signal and the samples of the digital estimated signal are all transmitted to a remote device. The remote unit reconstructs a digital signal from the selected and transmitted plurality of samples and the estimated samples.

好ましい実施例によれば、ディジタル信号は、通信さ
れる音声情報を表わす周波数領域の音声信号であり、推
定信号の各々のサンプルは、対応の周波数帯においての
周波数領域信号のスペクトル推定値である。周波数領域
信号を再構成するために、周波数領域音声信号の各々の
選択されないサンプルについて乱数を発生させる。選択
されないサンプルのノイズ成分の大きさのノイズ推定値
を少くとも１つのスペクトル推定値から作成する。ノイ
ズ推定値に基づいてスケーリング因子を発生させる。次
に乱数をスケーリング因子に従ってスケーリングし、選
択されないサンプルを表わす再構成されたサンプルを作
成する。According to a preferred embodiment, the digital signal is a frequency domain audio signal representing the audio information to be communicated, and each sample of the estimated signal is a spectral estimate of the frequency domain signal in the corresponding frequency band. To reconstruct the frequency domain signal, a random number is generated for each unselected sample of the frequency domain audio signal. A noise estimate of the magnitude of the noise component of the unselected sample is created from at least one spectral estimate. Generate a scaling factor based on the noise estimate. The random number is then scaled according to a scaling factor to create a reconstructed sample representing the unselected sample.

推定信号と周波数領域音声信号とは、一連のフレーム
を各々備えている。各々のフレームは、特定された時間
ウィンドウに亘って、音声情報を表わしている。現在の
フレームに対する最初のノイズ推定値を作成するため
に、推定信号のある先行フレームについて、最初のノイ
ズ推定値を先ず作成する。信号指数の値に基づいて、立
上り時定数t_rと立下り時定数t_fとを選択する。選択され
た立上り時定数を最初のノイズ推定値に加算して上位の
閾値を形成し、また選択された立下り時定数を最初のノ
イズ推定値から減算して下位の閾値を形成する。The estimated signal and the frequency domain audio signal each comprise a series of frames. Each frame represents audio information over a specified time window. To generate an initial noise estimate for the current frame, an initial noise estimate is first generated for a previous frame with an estimated signal. Based on the value of the signal index, selecting a rise time constant t _r and time fall constant t _f. The selected rise time constant is added to the first noise estimate to form an upper threshold, and the selected fall time constant is subtracted from the first noise estimate to form a lower threshold.

次に、同一の周波数帯域を表わす現在のフレームの現
在のスペクトル推定値が、上位及び下位の閾値と比較さ
れる。現在のスペクトル推定値が閾値の中間にあれば、
現在のノイズ推定値を、現在のスペクトル推定値に等し
いとおく。現在のスペクトル推定値が上位の閾値よりも
大きければ、現在のノイズ推定値を上位の閾値に等しい
とおく。Next, the current spectral estimate of the current frame representing the same frequency band is compared to upper and lower thresholds. If the current spectral estimate is in the middle of the threshold,
Let the current noise estimate be equal to the current spectral estimate. If the current spectrum estimate is greater than the upper threshold, the current noise estimate is set equal to the upper threshold.

スケーリング因子を発生させるには、信号指数の値に
基づいて、ノイズ係数と音声指数とを最初に選定する。
次に、現在のノイズ推定値にノイズ指数を乗算する。同
様に、現在のスペクトル推定値に信号指数を乗算する。
これらの乗算結果は加算され、その和からスケーリング
因子を形成する。To generate a scaling factor, a noise factor and a speech index are first selected based on the value of the signal index.
Next, the current noise estimate is multiplied by the noise figure. Similarly, multiply the current spectral estimate by the signal index.
The results of these multiplications are added together to form a scaling factor from the sum.

別の視点によれば、本発明は、ディジタル信号の一連
のフレームの各々の音声成分のエネルギーを推定する方
法及び装置に関する。第１フレームの第１周波数帯域中
のノイズエネルギーを表わす第１フレームノイズ推定値
を作成する。第２フレームの第１周波数帯域中のディジ
タル信号のエネルギーを表わす信号エネルギー値も形成
する。信号エネルギー値が第１フレームノイズ推定値よ
りも大であれば、第１フレームノイズ推定値に小さな増
分を加えて第２フレームノイズ推定値を形成する。エネ
ルギー値が第１フレームノイズ推定値よりも小さけれ
ば、第２フレームノイズ推定値から大きな減分値を引算
して、第２フレームノイズ推定値を形成する。第２フレ
ームノイズ推定値を第１周波数帯域の信号エネルギーか
ら引算して、第２フレームの第１周波数帯域中の音声成
分のエネルギーを表わす音声推定値を形成する。According to another aspect, the present invention relates to a method and apparatus for estimating the energy of a speech component of each of a series of frames of a digital signal. A first frame noise estimate is generated that represents noise energy in a first frequency band of the first frame. A signal energy value representing the energy of the digital signal in the first frequency band of the second frame is also formed. If the signal energy value is greater than the first frame noise estimate, a small increment is added to the first frame noise estimate to form a second frame noise estimate. If the energy value is less than the first frame noise estimate, a large decrement is subtracted from the second frame noise estimate to form a second frame noise estimate. The second frame noise estimate is subtracted from the signal energy in the first frequency band to form a speech estimate representing the energy of the speech component in the first frequency band of the second frame.

本発明のその他の目的、特徴及び利点は、図面を参照
して、以下に説明する好ましい実施例によって一層明ら
かとなるであろう。Other objects, features and advantages of the present invention will become more apparent from the preferred embodiments described below with reference to the drawings.

図面の簡単な説明図１（ａ）は、ビデオ会議システムの近端のブロック
線図である。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1A is a block diagram of a near end of a video conference system.

図１（ｂ）は、ビデオ会議システムの遠端のブロック
線図である。FIG. 1B is a block diagram of the far end of the video conference system.

図２（ａ）〜（ｃ）は、１組の周波数係数及び２種の
周波数係数の推定値を示す線図である。FIGS. 2A to 2C are diagrams showing estimated values of one set of frequency coefficients and two kinds of frequency coefficients.

図３（ａ）、３（ｂ）は、スペクトル推定値を計算す
るプロセスを示すフローチャートである。3 (a) and 3 (b) are flowcharts illustrating a process for calculating a spectrum estimation value.

図４は、ブロードバンドについて構成した１組のスペ
クトル推定値を示す線図である。FIG. 4 is a diagram illustrating a set of spectral estimates configured for broadband.

図５は、音声検出装置を示すブロック線図である。 FIG. 5 is a block diagram showing a voice detection device.

図６は、マイクロホン信号のフレーム内の音声量を推
定する方法を示すフローチャートである。FIG. 6 is a flowchart showing a method for estimating the amount of sound in a frame of a microphone signal.

図７は、ビットレート推定器のブロック線図である。 FIG. 7 is a block diagram of the bit rate estimator.

図８（ａ）、８（ｂ）は、スペクトル推定値当りの許
容歪みの第１推定値を計算する方法を示すフロー線図で
ある。FIGS. 8A and 8B are flowcharts showing a method for calculating a first estimated value of allowable distortion per spectrum estimated value.

図９は、スペクトル推定値当りの許容和みの第２推定
値を計算する方法を示す線図である。FIG. 9 is a diagram showing a method for calculating a second estimated value of the allowable sum per spectrum estimated value.

図10は、量子化ステップサイズを計算する方法を示す
フロー線図である。FIG. 10 is a flowchart showing a method of calculating the quantization step size.

図11は、係数量子化ステップサイズを形成するために
帯域量子化ステップサイズの間を補間することを示す線
図である。FIG. 11 is a diagram illustrating interpolating between band quantization step sizes to form a coefficient quantization step size.

図12は、係数充填モジュールを示すブロック線図であ
る。FIG. 12 is a block diagram showing a coefficient filling module.

好ましい実施例の説明システムの概観図１（ａ）、１（ｂ）において、ビデオ会議システム
は、近端で話している人の声を受信して、その音声を表
わす電子マイクロホン信号ｍ（ｔ）（但し、ｔは時間を
表わす変数）を発生させる近端マイクロホン12を備えて
いる。同様に、カメラ14は、人の像を表わすビデオ信号
ｖ（ｔ）を発生させるように、話している人に焦点合せ
される。DESCRIPTION OF THE PREFERRED EMBODIMENTS System Overview In FIGS. 1 (a) and 1 (b), a video conferencing system receives the voice of a person speaking at the near end and an electronic microphone signal m (t) representing that voice. (Where t is a variable representing time). Similarly, the camera 14 is focused on the talking person to generate a video signal v (t) representing an image of the person.

マイクロホン信号は、マイクロホン信号ｍ（ｔ）をデ
ィジタル化して符号化するオーディオエンコーダ16に供
給される。以下に詳述するように、エンコーダ16は、全
体としてマイクロホン信号を表わす、１組のディジタル
符号化信号F_h（ｋ）、S_h（ｊ）、g_i（Ｆ）及びA_i（Ｆ）
を発生させる。以下に詳述するように、ｋ、ｊは、周波
数値を表わす整数、Ｆは、マイクロホン信号の「フレー
ム」を特定する整数である。同様に、ビデオ信号ｖ
（ｔ）は映像信号をディジタル化し符号化するビデオエ
ンコーダ18に供給される。The microphone signal is supplied to an audio encoder 16 that digitizes and encodes the microphone signal m (t). As described in more detail below, encoder 16 includes a set of digitally encoded signals F _h (k), S _h (j), g _i (F), and A _i (F), which together represent the microphone signal.
Generate. As described in detail below, k and j are integers representing frequency values, and F is an integer specifying a “frame” of the microphone signal. Similarly, the video signal v
(T) is supplied to a video encoder 18 for digitizing and encoding the video signal.

符号化された映像信号V_eと、符号化された１組のマイ
クロホン信号とは、伝送チャンネル22上に伝送するため
に信号をシリアルビットストリームに併合（マージ）す
るビットストリームコントローラー20に供給される。The encoded video signal V _e and the encoded set of microphone signals are provided to a bit stream controller 20 that merges the signals into a serial bit stream for transmission on a transmission channel 22. .

図１（ｂ）を参照して、遠端のビットストリームコン
トローラ24は、伝送チャンネル22からのビットストリー
ムを受信して、それをビデオ成分とオーディオ成分とに
分離する。受信されたビデオ信号V_eは、ビデオデコーダ
26によってデコードされ、ディスプレイ装置27に供給さ
れ、近端のカメラ像の表示を再生する。それと同時に、
オーディオデコーダ28は、受信された１組のマイクロホ
ン信号をデコードし、近端のマイクロホン信号ｍ（ｔ）
を表わすスピーカー信号Ｌ（ｔ）を発生させる。スピー
カー29は、スピーカー信号に応答して近端の音声を再生
する。Referring to FIG. 1B, the far-end bit stream controller 24 receives the bit stream from the transmission channel 22 and separates it into a video component and an audio component. The received video signal _Ve is converted to a video decoder
It is decoded by 26 and supplied to the display device 27 to reproduce the display of the near-end camera image. At the same time,
An audio decoder 28 decodes the received set of microphone signals and provides a near-end microphone signal m (t).
Is generated. The speaker 29 reproduces near-end sound in response to the speaker signal.

図１（ａ）において、エンコーダ16は、マイクロホン
信号ｍ（ｔ）をディジタル化し濾波してディジタル化マ
イクロホン信号ｍ（ｎ）を生成される入力信号コンディ
ショナー30を備えている。ここにｎは、時点を示す整数
である。ディジタル化マイクロホン信号ｍ（ｎ）は、図
示した例では512個の連続したサンプル（フレームと称
する）から成る複数の群m_p（ｎ）に、ｍ（ｎ）を分離さ
せる。各フレームは、互に重なるように選定する。より
詳しくは、各々のフレームは、直前の群からの最後の16
サンプルと、496個の新しいサンプルとを有する。In FIG. 1 (a), encoder 16 includes an input signal conditioner 30 that digitizes and filters microphone signal m (t) to produce digitized microphone signal m (n). Here, n is an integer indicating the time point. The digitized microphone signal m (n) separates m (n) into a plurality of groups m _p (n) of 512 consecutive samples (referred to as frames) in the example shown. Each frame is selected to overlap each other. More specifically, each frame contains the last 16
It has a sample and 496 new samples.

各々のサンプルフレームは、フレーム中の全てのマイ
クロホンの平均エネルギーE_avを計算する正規化モジュ
ール34に供給される。Each sample frame is provided to a normalization module 34 which calculates the average energy _Eav of all microphones in the frame.

モジュール34は、次に、各のフレームＦについて、そ
のフレームの平均エネルギーE_avの平方根に等しい正規
化利得ｇ（Ｆ）を計算する。 Module 34 then calculates, for each frame F, a normalized gain g (F) equal to the square root of the average energy E _av of that frame.

正規化利得ｇは、以下に詳述するように、各フレーム
中においてマイクロホンサンプルをスケーリングするた
めに、近端において使用される。この正規化利得は、デ
コードされたマイクロホンサンプルをその元のスケール
に戻すために、遠端において用いられる。従って、近端
においてサンプルをスケーリングするために用いられた
ものと同一の利得を、マイクロホン信号の再スケーリン
グのために使用するために、遠端に伝送しなければなら
ない。モジュール34によって計算された利得ｇは、比較
的多数のビットをもつため、モジュール34は、利得ｇを
特定するビット数よりも少い数のビットを用いて利息ｇ
を表わす利得量子化インデックスg_iを生成する量子化器
35に利得ｇを供給する。利得量子化インデックスg_iは、
遠端に伝送されるように、ビットストリームコントロー
ラ20に次に供給される。 The normalized gain g is used at the near end to scale the microphone samples during each frame, as described in more detail below. This normalization gain is used at the far end to return the decoded microphone samples to their original scale. Therefore, the same gain used to scale the samples at the near end must be transmitted to the far end for use for rescaling the microphone signal. Since the gain g calculated by module 34 has a relatively large number of bits, module 34 uses a smaller number of bits than the number of bits to specify gain g
Quantizer that generates a gain quantization index g _i representing
Supply gain g to 35. The gain quantization index g _i is
It is then provided to bitstream controller 20 for transmission to the far end.

遠端のオーディオデコーダ28は、伝送された利得量子
化指数（利得量子化インデックス）g_iから利得を再構成
し、この際構成された利得g_qからマイクロホン信号をそ
の元のスケールに戻す。再構成された利得g_qは、典型的
には、元の利得ｇと多少異なっているため、近端のオー
ディオエンコーダ16は、遠端において用いたものと同一
の利得g_qを用いて、マイクロホン信号を正規化する。よ
り詳しくは、逆量子化器は、遠端のオーディオデコーダ
28と同様にして、利得量子化インデックスg_iから利得g_q
を再構成する。The far end audio decoder 28 reconstructs the gain from the transmitted gain quantization index (gain quantization index) g _i , returning the microphone signal to its original scale from the constructed gain g _q . Since the reconstructed gain _gq is typically slightly different from the original gain g, the near-end audio encoder 16 uses the same gain _gq as used at the far end to Normalize the signal. More specifically, the inverse quantizer is a far-end audio decoder
Similarly to 28, the gain quantization index g _i to the gain g _q
Reconfigure.

量子化利得g_q及びマイクロホン信号フレームm_F（ｎ）
は、離散コサイン変換モジュール（DCT）36に転送さ
れ、DCT36は、各のマイクロホンサンプルm_F（ｎ）を利
得g_qで割算して、正規化サンプルｍ′（ｎ）を形成す
る。DCT36は、次に、周知の離散コサイン変数アルゴリ
ズムを用いて、正規化されたマイクロホン信号ｍ′
（ｎ）の群を正規化マイクロホン信号に変換する。DCT3
6は、このようにして、フレームの周波数スペクトルの
サンプルを表わす512個の周波数係数Ｆ（ｋ）を発生さ
せる。Quantization gain g _q and microphone signal frame m _F (n)
Is transferred to a discrete cosine transform module (DCT) 36, which divides each microphone sample m _F (n) by a gain g _q to form a normalized sample m ′ (n). The DCT 36 then uses the well-known discrete cosine variable algorithm to normalize the microphone signal m '
The group of (n) is converted into a normalized microphone signal. DCT3
6 thus generates 512 frequency coefficients F (k) representing samples of the frequency spectrum of the frame.

周波数係数Ｆ（ｋ）は、（後述するエントロピー適応
型トランスファーコーダを用いて）符号化され、遠端に
伝送さる。符号化器16は、これらの係数を伝送するのに
必要なビット数を低減させるために、周波数スペクトル
のいろいろの領域中の信号（例えば音声及びノイズの相
対量を推定し、比較的多量のノイズを含む周波数帯につ
いては係数を伝送しないことを選択する。更に、符号化
器16は、係数の周波数を含む周波数帯中に存在するノイ
ズの量に依存して、係数を表わすのに必要なビットの数
を、伝送について選定した各々の係数について選択す
る。より詳しくは、周知のように、人は、比較的多量の
オーディオ信号を有するオーディオスペクトルの領域で
は、より多くの妨害ノイズに耐えることができることが
知られている。これは、オーディオ信号がノイズをマス
キングする傾向をもつためである。そのため、符号化器
16は、比較的多量のオーディオ信号を粗く量子化する。
それは、粗い量子化によって導入された量子化歪みはオ
ーディオ信号がマスキングするからである。そのため符
号化器16は、係数によって表わされるオーディオ信号の
量に合せた量子化ステップサイズを選定することによっ
て、各々の係数を表わすのに必要なビット数を最小とす
る。The frequency coefficient F (k) is encoded (using an entropy adaptive transfer coder described below) and transmitted to the far end. Encoder 16 may reduce the number of bits required to transmit these coefficients by estimating the relative amounts of signals (e.g., speech and noise) in various regions of the frequency spectrum, and generating a relatively large amount of noise. In addition, the encoder 16 chooses not to transmit the coefficients for the frequency bands containing the coefficients, and the encoder 16 determines the number of bits required to represent the coefficients, depending on the amount of noise present in the frequency band containing the frequencies of the coefficients. Is selected for each coefficient selected for transmission.More specifically, as is well known, one can tolerate more interference noise in regions of the audio spectrum that have a relatively large amount of audio signal. It is known that the audio signal has a tendency to mask noise, so that the encoder
16 coarsely quantizes a relatively large amount of audio signal.
This is because the quantization distortion introduced by the coarse quantization masks the audio signal. Therefore, the encoder 16 minimizes the number of bits required to represent each coefficient by selecting a quantization step size that matches the amount of audio signal represented by the coefficient.

スペクトルのいろいろの領域においてのノイズ及び信
号の相対量を推定するために、周波数係数Ｆ（ｋ）は、
最初に、スペクトル推定モジュール38に供給される。以
下に詳述するように、詳細度（デテイル）の少ないフレ
ームの周波数スペクトルを表わすスペクトル推定値のよ
り小さな数の値に、周波数係数Ｆ（ｋ）を減少させる
（ｊは、スペクトル中の周波数帯域を表わす整数であ
る）。音声検出モジュール40は、近端の室中のノイズに
よるマイクロホン信号のエネルギー成分を推定するため
に、各々のフレームのスペクトル推定値を処理する。モ
ジュール40は次に、音声情報に帰せられるべきフレーム
中のマイクロホン信号の割合（％）を近似する信号イン
デックスA_i（Ｆ）を供給する。To estimate the relative amounts of noise and signal in various regions of the spectrum, the frequency coefficient F (k) is
First, it is supplied to the spectrum estimation module 38. As described in detail below, the frequency coefficient F (k) is reduced to a smaller number of spectral estimates representing the frequency spectrum of the frame with less detail (detail) (j is the frequency band in the spectrum). ). The voice detection module 40 processes the spectral estimates of each frame to estimate the energy component of the microphone signal due to noise in the near-end room. Module 40 then provides a signal index A _i (F) that approximates the percentage (%) of the microphone signal in the frame to be attributed to the audio information.

信号インデックスA_i（Ｆ）及びスペクトル推定値Ｓ
（ｊ）は、遠端に伝送される周波数係数Ｆ（ｋ）を量子
化し符号化するために、どちらも使用される。従って、
これらは、周波数係数を再構成するために、どちらも遠
端において必要とされる。信号インデックスA_i（Ｆ）
は、３ビットしかもたず、従って、ビットストリームコ
ントローラ20に伝送のために直接に供給される。しかし
スペクトル推定値Ｓ（ｊ）は、対数スペクトル推定値Lo
g₂S²（ｊ）に変換され、伝送すべきビット数を減少させ
るために、周知の差分パルスコード変調符号化器（DPC
M）39を用いて符号化される。（DPCMは、利得１の予測
器の係数をもった１次DPCMである）。符号化された対数
スペクトル推定値S_e（ｊ）は、伝送すべきビット数を更
に減少させるために、ハフマン符号化器49を用いて更に
符号化される。結果として得られるハフマン符号S
_h（ｊ）は、伝送のために、ビットストリームコントロ
ーラ20に供給される。Signal index A _i (F) and spectral estimate S
Both (j) are used to quantize and encode the frequency coefficient F (k) transmitted to the far end. Therefore,
These are both needed at the far end to reconstruct the frequency coefficients. Signal index A _i (F)
Has only three bits and is therefore provided directly to the bit stream controller 20 for transmission. However, the spectral estimate S (j) is
g ₂ S ² (j), and in order to reduce the number of bits to be transmitted, a well-known differential pulse code modulation encoder (DPC)
M) 39. (The DPCM is a first order DPCM with a unity gain predictor coefficient). The encoded log spectrum estimate S _e (j) is further encoded using a Huffman encoder 49 to further reduce the number of bits to be transmitted. The resulting Huffman code S
_h (j) is provided to the bitstream controller 20 for transmission.

遠端オーディオデコーダ28は、ハフマン符号S_h（ｊ）
から対数スペクトル推定値Log₂S²（ｊ）を再構成する。
しかし、DPCM符号化器39の作用により、再構成されたス
ペクトル推定値Log₂S_q ²（ｊ）は、元の推定値Log₂S
²（ｊ）と同一ではない。このため、近端のデコーダ41
は、符号化された推定値S_e（ｊ）を遠端においてなされ
たものと同様にしてデコードし、このようにしてデコー
ドされた推定値Log₂S_q ²（Ｊ）を、周波数係数Ｆ（ｋ）
を量子化し符号化するのに使用する。従って、（以下に
詳述するよう）にオーディオ符号化器16は、信号Log₂S_q
²（ｊ）を遠端において再構成するために遠端オーディ
オデコーダ28によって使用されたものと同一の推定値Lo
g₂S_q ²（ｊ）を用いて、周波数係数を符号化する。ビッ
トレート推定器42は、デコードされたスペクトル推定値
Log₂S_q ²（ｊ）及び信号インデックスA_i（Ｆ）の値に基
づいて、遠端に伝送されるのに値するある十分な量の音
声情報を全体として有する周波数係数群を選定する。ビ
ットレート推定器は、次に、伝送のための係数の量子化
に用いられるビット数を定める量子化ステップサイズ
を、各々の伝送すべき係数のために選択する。各々の選
択された周波数係数の群について、ビットレート推定器
42は、各々の周波数帯域ｊについて、群量子化ステップ
サイズＱ（ｊ）及び「クラス」Ｃ（ｊ）（後述する）を
先ず計算する。次に量子化ステップサイズは、各々の周
波数係数Ｆ（ｋ）について係数量子化ステップサイズＱ
（ｋ）を与えるように、次に補間される。Ｑ（ｋ）は、
次に、係数量子化器44に供給され、係数量子化器44は、
割当てられたステップサイズに基づいて、対応の量子化
インデックスＩ（ｋ）を供給するため帯域内の各々の周
波数係数を量子化する。The far-end audio decoder 28 receives the Huffman code S _h (j)
Reconstruct the log spectrum estimate Log ₂ S ² (j) from
However, due to the operation of the DPCM encoder 39, the reconstructed spectral estimate Log ₂ S _q ² (j) becomes the original estimate Log ₂ S
² Not the same as (j). For this reason, the near-end decoder 41
Decodes the encoded estimate S _e (j) in the same way as that made at the far end, and converts the decoded estimate Log ₂ S _q ² (J) in this way to the frequency coefficient F ( k)
Is used to quantize and encode. Thus, as described in more detail below, audio encoder 16 outputs signal Log ₂ S _q
^{2 The} same estimate Lo used by far end audio decoder 28 to reconstruct (j) at the far end
The frequency coefficient is encoded using g ₂ S _q ² (j). The bit rate estimator 42 calculates the decoded spectrum estimate
Based on the value of Log ₂ S _q ² (j) and the signal index A _i (F), a group of frequency coefficients having a certain sufficient amount of audio information as a whole worth transmitting to the far end is selected. The bit rate estimator then selects a quantization step size for each coefficient to be transmitted, which determines the number of bits used to quantize the coefficient for transmission. A bit rate estimator for each selected group of frequency coefficients
42 first calculates the group quantization step size Q (j) and "class" C (j) (described below) for each frequency band j. Next, the quantization step size is a coefficient quantization step size Q for each frequency coefficient F (k).
It is then interpolated to give (k). Q (k) is
Next, it is supplied to a coefficient quantizer 44, and the coefficient quantizer 44
Based on the assigned step size, each frequency coefficient in the band is quantized to provide a corresponding quantization index I (k).

リミットコントローラ45は、各々の量子化インデック
スＩ（ｋ）に応答して、ハフマン指数I_h（ｋ）を生成す
る。リミットコントローラによって供給されるハフマン
指数I_h（ｋ）は、ハフマン符号F_h（ｋ）を与えるよう
に、次にハフマン符号化器47によって符号化される。ハ
フマン符号は、伝送チャンネル22を介して伝送されるよ
うに、ビットストリームコントローラ20に供給される。The limit controller 45 generates a Huffman index I _h (k) in response to each quantization index I (k). The Huffman index I _h (k) supplied by the limit controller is then encoded by a Huffman encoder 47 to give a Huffman code F _h (k). The Huffman code is provided to bitstream controller 20 for transmission over transmission channel 22.

リミットコントローラは、低周波量子化指数（即ちｋ
＝０）に始まって前記の符号化プロセスを開始し、全て
の指数が符号化されるまで、又はマイクロホン信号に割
当てられたチャンネル22の容量を符号化されたビットの
数が超過するまで、このプロセスを継続する。リミット
コントローラ45は、容量の超過が生じたことを検知する
と、残りの指数I_h（ｋ）を捨て（廃棄し）フレームの残
りの周波数係数が伝送のために符号化されないことを示
す独自のハフマン指数I_h（ｋ）を送出する。（後述する
ように、遠端のデコーダは、伝送されたスペクトル推定
値から符号化されない係数を推定する）。The limit controller uses a low frequency quantization index (ie, k
= 0) and starts the encoding process until all exponents are encoded or the number of encoded bits exceeds the capacity of channel 22 allocated to the microphone signal. Continue the process. When limit controller 45 detects that the capacity has been exceeded, it discards (discards) the remaining exponents I _h (k) and displays a unique Huffman signal indicating that the remaining frequency coefficients of the frame are not encoded for transmission. Send out the index I _h (k). (As described below, the far-end decoder estimates the uncoded coefficients from the transmitted spectral estimate).

スペクトル推定値を計算するためのスペクトル推定値Ｓ
（ｊ）図２（ａ〜ｃ）及び３（ａ〜ｂ）を参照して、次に、
スペクトル推定モジュール38について詳述する。モジュ
ール38は、最初にＬ個の隣接した係数の帯域に、周波数
係数Ｆ（ｋ）を分割する（Ｌは好ましくは10である）
（ステップS110）。モジュール38は、各々の帯域ｊにつ
いて、その帯域中のスペクトルエネルギーを計算するこ
とによって、全帯域を表わす第１次近似サンプルを提供
する。より詳細には、モジュール38は、帯域中の各々の
周波数係数Ｆ（ｋ）を平方（２乗）し、帯域中の全ての
係数の平行（２乗）を加算する（ステップ112、114）
（図２（ｂ）、２（ｃ）も参照）。Spectral estimate S for calculating the spectral estimate
(J) Referring to FIGS. 2 (a-c) and 3 (ab), next,
The spectrum estimation module 38 will be described in detail. Module 38 first divides frequency coefficient F (k) into L adjacent coefficient bands (L is preferably 10).
(Step S110). Module 38 provides, for each band j, a first approximation sample representing the entire band by calculating the spectral energy in that band. More specifically, module 38 squares (squared) each frequency coefficient F (k) in the band and adds the parallel (squared) of all coefficients in the band (steps 112, 114).
(See also FIGS. 2 (b) and 2 (c)).

この近似は、隣接した帯域の間の境界において高エネ
ルギートーンをスペクトルが含む場合に、スペクトルの
貧弱な表現しか与えない。例えば、図２（ｂ）に示した
スペクトルは、帯域52、54の間の境界に、トーン50を含
む。近似サンプル58（帯域54中の全係数の２乗和を表わ
す）と近似サンプル56（帯域52中の全係数の平方（２
乗）の和を表わす）との間の補間は、トーン50の存在を
正確に反映していない値59を生ずる。This approximation only gives a poor representation of the spectrum if the spectrum contains high energy tones at the boundary between adjacent bands. For example, the spectrum shown in FIG. 2B includes a tone 50 at the boundary between the bands 52 and 54. Approximate sample 58 (representing the sum of squares of all coefficients in band 54) and approximate sample 56 (square (2
Interpolation between (representing the sum of the powers) yields a value 59 that does not accurately reflect the presence of tone 50.

従って、モジュール38は、両方の帯域52〜54について
のスペクトル推定が帯域間の境界に近いトーン50の存在
を反映する第２近似手法も用いる。Therefore, module 38 also uses a second approximation approach where the spectral estimates for both bands 52-54 reflect the presence of tone 50 near the boundary between the bands.

より詳細には、モジュール38は、帯域中の10個のサン
プルと各々の隣接する帯域の５個の隣接するサンプルと
を用いて、各々の帯域について第２近似を導出する（図
２（ａ）参照）。モジュール38は、全20サンプルの２進
値について、論理インクルーシブOR演算を行う（ステッ
プ116）。この操作は、組中の最大サンプルの大きさの
計算上廉価な推定を与える。より詳しくは、“OR"演算
の結果２進値の各桁は、オペランド２進値がディジット
に１を有する場合に１とされる（例えば0110“OR"0011
＝0111）。従って“OR"演算の結果は、最小で、組中の
最大サンプルの値に等しくなり、最大で大きさは２倍と
なる。次に、“OR"演算の結果は、第２近似を与えるよ
うに、２倍にされる（ステップ117）。More specifically, module 38 derives a second approximation for each band using 10 samples in the band and 5 adjacent samples in each adjacent band (FIG. 2 (a)). reference). The module 38 performs a logical inclusive OR operation on the binary values of all 20 samples (step 116). This operation gives a computationally inexpensive estimate of the size of the largest sample in the set. More specifically, each digit of the binary value resulting from the "OR" operation is set to 1 if the operand binary value has a digit of 1 (eg, 0110 "OR" 0011).
= 011). Thus, the result of the "OR" operation is at a minimum equal to the value of the largest sample in the set, and at most twice as large. Next, the result of the "OR" operation is doubled to give a second approximation (step 117).

図２（ａ）に示すように、帯域52中の第２近似は、帯
域54からのトーン50を含むため、帯域52について比較的
大きな近似値60を与える。第２近似60は、第１近似56よ
りも正確に、トーン50の存在を反映している。従って、
モジュール38は、第２近似を第１近似と比較し（ステッ
プ118）、二者のうち大きさ方を平方（２乗）スペクト
ル推定値S²（ｊ）として選択する（ステップ120〜12
2）。モジュール38は、最後に、平方（２乗）スペクト
ル推定Log₂S²（ｊ）の対数を計算する（ステップ22
4）。As shown in FIG. 2A, the second approximation in band 52 includes tone 50 from band 54, thus giving a relatively large approximation 60 for band 52. The second approximation 60 reflects the presence of the tone 50 more accurately than the first approximation 56. Therefore,
Module 38, a second approximation as compared to the first approximation (step 118), square towards magnitude of the two parties (the square) is selected as the spectral estimate S ² (j) (step 120-12
2). Module 38 finally calculates the logarithm of the square (square) spectral estimate Log ₂ S ² (j) (step 22).
Four).

信号指数A_iを計算するための音声検出器図４を参照して、音声検出器40は、フレーム中の音声
エネルギー相対量を示す信号指数A_i（Ｆ）を計算する。
検出器40は、この目的のために、スペクトル推定値のサ
ンプルを、複数の周波数のブロードバンドにグループ化
する。これらのブロードバンドは、該スペクトルに亘っ
て不均一に分布された可変の帯域幅を有する。一例とし
て、第１のブロードバンドは、第２のブロードバンドよ
り広くよく、また第２のブロードバンドは、第３のブロ
ードバンドよりも広くてよい。Voice Detector for Calculating Signal Index A _i Referring to FIG. 4, the voice detector 40 calculates a signal index A _i (F) indicating a relative amount of voice energy in a frame.
Detector 40 groups the samples of the spectral estimates into multiple frequency broadbands for this purpose. These broadbands have variable bandwidths that are unevenly distributed over the spectrum. As an example, the first broadband may be wider than the second broadband, and the second broadband may be wider than the third broadband.

図５を参照して、音声検出器は、各々のブロードバン
ド中の背景ノイズS_nの量を推定する。各ブロードバンド
について、音声検出器40は、集合的（アグリゲート）推
定値S_a（Ｆ）を次のように形成する。Referring to FIG. 5, the speech detector, to estimate the amount of background noise S _n in each broadband. For each broadband, speech detector 40 forms a collective (aggregate) estimate S _a (F) as follows.

ここに、Ｆは現在のフレームを識別する整数、ＸとＹ
は該ブロードバンドの上位と下位の周波数である（ステ
ップ210）。以下に詳述するように、音声検出器40は、
現在のフレームの集合的推定値と、以前のフレームから
の以前の帯域の集合的推定値とを比較し、帯域中のノイ
ズの量を定める（ステップ211）。一般に、およその推
定値が比較的低いフレームは、ブロードバンド中に音声
エネルギーをほとんど含まないことが多い。このため、
このフレームの集合的推定値は、ブロードバンド中の背
景ノイズの妥当な推定を与える。 Where F is an integer identifying the current frame, X and Y
Are the upper and lower frequencies of the broadband (step 210). As described in detail below, the voice detector 40
The collective estimate of the current frame is compared with the collective estimate of the previous band from the previous frame to determine the amount of noise in the band (step 211). In general, frames with relatively low approximate estimates often contain little voice energy in broadband. For this reason,
The collective estimate of this frame gives a reasonable estimate of the background noise in broadband.

音声検出器は、推定値をノイズ閾値と比較するため
に、各々の集合的推定値を先ず正規化しなければならな
い。さもないと、現在のフレームからの推定値は、以前
のフレームからの推定値とスケーリングが異なってしま
い、比較の意味が失われる。このため、音声検出器は、
正規化利得を集合的推定値と同じ対数スケールにおくた
めに、先ずLog₂（g_q ²）を計算することによって、集合
的推定値を脱正規化（unnormalize）する（ステップ21
2）。The speech detector must first normalize each collective estimate in order to compare the estimate to a noise threshold. Otherwise, the estimate from the current frame will scale differently than the estimate from the previous frame, making the comparison meaningless. For this reason, the sound detector
In order to keep the normalized gain on the same log scale as the collective estimate, the collective estimate is unnormalized by first calculating Log ₂ (g _q ² ) (step 21).
2).

この脱正規化された推定値を、上位の閾値S_r（Ｆ）及
び下位の閾値S_f（Ｆ）と比較する（Ｆは現在のフレーム
を表わす）（ステップ214）。後に詳述するように、こ
れらの閾値は、以前のフレームからのノイズの推定値に
基づいて、各々のフレームについて計算される。第１フ
レームは、先行フレームをもたないので、第１フレーム
の上位閾値S_r（０）は、値“r"に初期化され、下記閾値
S_f（０）は、−ｆに初期化される。ここにｆは、ｒより
も実質的に大きい（例えば、ｒ＝１、ｆ＝10）。The denormalized estimate is compared to an upper threshold S _r (F) and a lower threshold S _f (F), where F represents the current frame (step 214). As described in more detail below, these thresholds are calculated for each frame based on an estimate of the noise from previous frames. Since the first frame has no previous frame, the upper threshold S _r (0) of the first frame is initialized to a value “r”, and
S _f (0) is initialized to −f. Where f is substantially greater than r (eg, r = 1, f = 10).

集合的推定値S_a（Ｆ）が閾値の間にあると、音声検出
器は、フレームからのノイズの推定値を、集合的推定値
に等しいとおく（ステップ216）。If the collective estimate S _a (F) is between the thresholds, the speech detector sets the estimate of noise from the frame equal to the collective estimate (step 216).

S_n（Ｆ）＝S_a（Ｆ）（４）集合的推定値が上位閾値よりも大であると、ノイズ推
定値は、上位閾値に等しいとおかれる（ステップ21
8）。S _n (F) = S _a (F) (4) If the collective estimate is greater than the upper threshold, the noise estimate is set equal to the upper threshold (step 21).
8).

S_n（Ｆ）＝S_r（Ｆ）（５）最後に、集合的推定値が下位閾値より小さい場合は、
ノイズ推定値は、下記閾値に等しいとおかれる（ステッ
プ220）。S _n (F) = S _r (F) (5) Finally, if the collective estimate is smaller than the lower threshold,
The noise estimate is set equal to the following threshold (step 220).

S_n（Ｆ）＝S_f（Ｆ）（６）次のフレームのノイズ推定値を計算する前に、音声検
出器は、次のフレームの閾値S_f（Ｆ＋１）、S_r（Ｆ＋
１）を調整し、現在のフレームのノイズ推定値S_n（Ｆ）
にまたがるようにする（ステップ222）。より詳しく
は、次のフレームＦ＝Ｆ＋１については、音声検出器
は、現在のフレームのノイズ推定値から上位ノイズ閾値
で次式 S_r（Ｆ＋１）＝S_n（Ｆ）＋ｒ（７）によって計算する。S _n (F) = S _f (F) (6) Before calculating the noise estimate for the next frame, the speech detector determines the thresholds S _f (F + 1), S _r (F +
1) to adjust the noise estimate S _n (F) of the current frame
(Step 222). More specifically, for the next frame F = F + 1, the speech detector calculates from the noise estimate of the current frame at the upper noise threshold by the following equation: S _r (F + 1) = S _n (F) + r (7) .

同様に音声検出器は、下位閾値S_f（Ｆ＋１）＝S
_n（Ｆ）−ｆを計算する。Similarly, the voice detector calculates the lower threshold value S _f (F + 1) = S
Calculate _n (F) -f.

従って、音声検出器は、各々の新しいフレームについ
て、ノイズ推定値を計算し、上位及び下位のノイズ閾値
を、ノイズ推定値をまたがるように調整する。Thus, for each new frame, the speech detector calculates a noise estimate and adjusts the upper and lower noise thresholds to span the noise estimate.

この手法は、ノイズ推定値を時間上に調整することに
よって、現在のフレームのあるブロードバンド中のノイ
ズ推定値がそのブロードバンドの最も最近（most recen
t）の最小の集合推定値となるように適応的に調節す
る。例えば、ブロードバンド内に音声成分をもたない一
連のフレームが到着した場合、集合的推定値は、背景ノ
イズの存在しか反映していないので、比較的小さくな
る。このため、これらの集合的値が下位閾値S_fより小さ
い場合、前記の手法は、ノイズ推定値が比較的低い集合
的推定値に等しくなるまで、比較的大きな増分ｆで、ノ
イズ推定値をすみやかに減少させる。This technique adjusts the noise estimate over time so that the noise estimate in a broadband of the current frame is the most recent of the broadband.
Adjust adaptively to get the minimum set estimate of t). For example, if a series of frames without audio components arrive in broadband, the collective estimate will be relatively small since it only reflects the presence of background noise. Thus, if these collective values are less than the lower threshold S _f, then the above approach will quickly reduce the noise estimate in relatively large increments f until the noise estimate is equal to the relatively low collective estimate. To reduce.

音声検出器は、ノイズレベルを上方に増分させること
によって、背景ノイズの増大を検出することができる。
しかし、比較的小さな増分“r"が用いられるため、ノイ
ズ推定値は、最も最近の最小の集合的推定値の近傍にと
どまる傾向を示す。これにより、音声検出器が過渡的な
音声エネルギーを背景ノイズと取違えるおそれが除かれ
る。The audio detector can detect an increase in background noise by incrementing the noise level upward.
However, because a relatively small increment "r" is used, the noise estimate tends to stay near the most recent minimum collective estimate. This eliminates the possibility that the speech detector will mistake the transient speech energy for background noise.

音声検出器は、あるフレームのノイズ推定値を計算し
た後、ノイズ推定値S_n（Ｆ）をそのフレームの集合的推
定値S_a（Ｆ）から差し引いて、ブロードバンド中の音声
信号の推定値を得る（ステップ224）。音声検出器は、
集合的推定値から閾値定数S_Tを減算し、音声信号が特定
の閾値S_Tを超過する度合を示す信号を発生する（ステッ
プ224）。最後に、音声検出器は全フレームの指数A
_i（Ｆ）をS_out信号及び正規化利得g_qから計算する（ス
テップ226）。After calculating the noise estimate for a frame, the speech detector subtracts the noise estimate S _n (F) from the collective estimate S _a (F) for that frame to obtain an estimate of the speech signal in broadband. (Step 224). The audio detector
The threshold constant S _T is subtracted from the collective estimate speech signal to generate a signal indicating the degree that exceeds the certain threshold S _T (step 224). Finally, the speech detector calculates the exponent A of all frames.
_i (F) is calculated from the S _out signal and the normalized gain g _q (step 226).

図６を参照して、S_out信号の集合からのA_i（Ｆ）の計
算の詳細について説明する。音声検出器は、全部のブロ
ードバンドから最大のS_outを選択する（ステップ24
6）。選定された値S_maxが０に等しいか又はこれより小
さいと、全てのブロードバンドは、音声成分をもたない
可能性が高い（ステップ248）。このため音声検出器は
指数A_i（Ｆ）を０とし、ブロードバンドがノイズしか含
まないことを示す（ステップ250）。The details of the calculation of A _i (F) from the set of S _out signals will be described with reference to FIG. The voice detector selects the largest S _out from all broadbands (step 24).
6). If the selected value _Smax is less than or equal to 0, all broadband is likely to have no audio component (step 248). Thus, the speech detector sets the index A _i (F) to 0, indicating that the broadband contains only noise (step 250).

S_maxが０より大きいと（ステップ248）音声検出器はS
_maxの値から指数A_i（Ｆ）を計算する。音声検出器は、
この目的のためにS_maxを固定利得G_sによってスケーリン
グする。即ち、Ｓ′_max＝S_max＊G_s（G_sは好ましくは、
0.2734）（ステップ252）。音声検出器は、次に、正規
化利得について対応して減衰された表示g_oを次式によっ
て計算する。If S _max is greater than 0 (step 248), the speech detector
_An index A _i (F) is calculated from the value of _max . The audio detector
Scaling the fixed gain G _s and S _max for this purpose. That is, S ′ _max = S _max * G _s (G _s is preferably
0.2734) (step 252). Speech detector is then calculated display g _o attenuated corresponding normalization gain the following equation.

g_o＝（Log₂g_q−T_g）＊G_g （８）ここにT_g、G_gは、予め定めた定数、例えばT_g＝4096、
G_g＝0.15625がある（ステップ254）。g_oがＳ′_maxより
も大ならば、音声検出器は、S_maxはフレーム中の音声エ
ネルギーよりも小さいと想定する。従って音声検出器
は、g_oを指数A_i（Ｆ）として選択し（ステップ256）さ
もなければＳ′_maxを指数A_i（Ｆ）として選択する（ス
テップ256）。音声検出器は、最後に、選定された指数
を最大指数i_maxと比較する（ステップ258）。選定され
た指数が最大の指数を超過すると、音声検出器は、指数
A_i（Ｆ）をその最大値i_maxに等しいとおく（ステップ26
2）。そうでなければ、選定された値が指数として用い
られる（ステップ260）。g _o = (Log ₂ g _q −T _g ) * G _g (8) Here, T _g and G _g are predetermined constants, for example, T _g = 4096,
There is G _g = 0.15625 (step 254). If the g _o is S _'max greater than, the speech detector, S _max is assumed to be smaller than the sound energy in the frame. Therefore speech detector selects g _o as index A _i (F) (step 256) otherwise selecting S _'max as index A _i (F) (step 256). The speech detector finally compares the selected index with the maximum index i _max (step 258). If the selected index exceeds the maximum index, the voice detector will
Let A _i (F) be equal to its maximum value i _max (step 26)
2). Otherwise, the selected value is used as an index (step 260).

ステップサイズＱ（ｋ）及び分類情報Ｃ（ｊ）を計算す
るためのビットレート推定器図１（ａ）を参照して、ビットレート推定器42は、指
数A_i（Ｆ）及び量子化された対数スペクトル推定値Log₂
S_q ²（ｊ）を受信し、これに応答して、各々の周波数係
数のためのステップサイズＱ（ｋ）と、周波数係数の各
々の帯域ｊのクラスインジケータｃ（ｊ）を計算する。
クラスインジケータｃ（ｊ）は、適切なコード表を選定
するためにハフマン符号化器47で用いられる。Bit Rate Estimator for Calculating Step Size Q (k) and Classification Information C (j) Referring to FIG. 1 (a), the bit rate estimator 42 calculates the index A _i (F) and the quantized Log spectrum estimate Log ₂
Receiving S _q ² (j), and in response, calculating a step size Q (k) for each frequency coefficient and a class indicator c (j) for each band j of frequency coefficients.
The class indicator c (j) is used by Huffman encoder 47 to select an appropriate code table.

ステップサイズ及びクラス情報を計算するためにビッ
トレート推定器42によって用いられる手順について以下
に詳述する。図７を参照して、ビット推定器は、信号指
数A_i（Ｆ）の各々の値についての所定の合計の許容可能
な歪み値D_Tを収納した第１テーブル70を備えている。各
々の値D_Tは、全フレームの集合的量子化歪み誤差を表わ
している。第２テーブル72は、最初にフレームに割当て
られた所定の最大数のビット数r_maxを指数A_i（Ｆ）の各
々の地について収納している（後述するように、必要な
らばより多くのビットをフレームに割当てることができ
る）。格納されたビットレートr_maxはA_i（Ｆ）と共に増
大する。その反対に、格納された歪み値D_Tは、A_i（Ｆ）
の各々の増大と共に減少する。一例として、A_i（Ｆ）が
０に等しい（即ち、ノイズ100％）の場合、前記該テー
ブルは、低ビットレート及び高い許容歪みを与える。A_i
（Ｆ）が７に等しい（即ちオーディオ100％）場合、前
記テーブルは、高ビットレート及び低い許容歪みを与え
る。第１テーブル及び第２テーブルは、指数A_i（Ｆ）の
値に基づいて、集合的許容量子化歪みD_T及び許容最大ビ
ット数r_maxを選定する。この許容歪みD_Tは、第１歪み近
似モジュール74に供給される。モジュール74は、次に、
推定値当りの許容歪みを表わす第１の許容サンプル歪み
値d₁を計算する。次にサンプル歪み値d₁を用いてｃ
（ｊ）及びブロック量子化ステップサイズＱ（ｋ）を導
出する。The procedure used by bit rate estimator 42 to calculate step size and class information is described in more detail below. Referring to FIG. 7, the bit estimator comprises a first table 70 containing a predetermined total allowable distortion value _DT for each value of the signal index A _i (F). Each value _DT represents the collective quantization distortion error for all frames. The second table 72 stores a predetermined maximum number of bits r _max initially assigned to the frame for each location of the index A _i (F) (as will be described later, more if necessary). Bits can be assigned to frames). The stored bit rate r _max increases with A _i (F). Conversely, the stored distortion value D _T is A _i (F)
Decreases with each increase in. As an example, if A _i (F) is equal to 0 (ie 100% noise), the table gives a low bit rate and a high tolerable distortion. A _i
If (F) is equal to 7 (ie 100% audio), the table gives a high bit rate and low tolerable distortion. The first table and the second table select the collective allowable quantization distortion D _T and the allowable maximum number of bits r _max based on the value of the index A _i (F). This allowable distortion _DT is supplied to the first distortion approximation module 74. Module 74 then:
Calculate a _first allowable sample distortion value d1 representing the allowable distortion per estimate. Then using a sample distortion value d ₁ c
(J) and the block quantization step size Q (k) are derived.

許容歪みd₁の第１近似値の計算図８（ａ）、８（ｂ）を参照して、モジュール74は、
許容合計歪みD_Tをスペクトル推定値の数Ｐで除算するこ
とによって、サンプル歪み値の初期値d₁を先ず計算する
（ステップ310）。以下に詳述するように、最終的な量
子化歪み値d_wよりも十分に大きな平方（２乗）スペクト
ル推定値をもつ帯域からの周波数係数のみを、伝送のた
めに符号化する。そのため、許容量子化値は、どの係数
が伝送されるかを定めるためのノイズ閾値として用いら
れる。従って、モジュール74は、量子化スペクトル推定
値S_q ²（ｊ）の平方（２乗）を形成するために、対数ス
ペクトル推定値Log₂S_q ²（ｊ）について、逆対数演算を
行う。モジュール74は、平方（２乗）スペクトル推定値
S_q ²（ｊ）がd₁より小さいか又はd₁に等しい場合に、ス
ペクトル推定値の構成周波数係数（ここでは「ノイジー
サンプル」と称する）は符号化しないものと、暫定的に
想定する（ステップ312）。従ってモジュール74は、こ
のような構成係数が符号化されない事実を反映させるよ
うに、サンプル歪み値d₁を増大させる。より詳しくは、
モジュール74は、d₁より小さいか又はd₁に等しい全ての
平方（２乗）スペクトル推定値の和D_NTを次式によって計算する（ステップ314）。Calculation of First Approximation of Allowable Distortion d _{1 With} reference to FIGS. 8A and 8B, the module 74 includes:
By dividing the allowable total distortion D _T by the number P of spectral estimates, the initial value d ₁ sample distortion value is first calculated (step 310). As will be described in detail below, only frequency coefficients from bands having a square (square) spectral estimate that is sufficiently larger than the final quantization distortion value d _w are encoded for transmission. Therefore, the permissible quantization value is used as a noise threshold for determining which coefficient is transmitted. Accordingly, module 74 performs an antilog operation on the log spectral estimate Log ₂ S _q ² (j) to form the square of the quantized spectral estimate S _q ² (j). Module 74 calculates the squared (squared) spectral estimate
If S _q ² (j) is equal to d ₁ less than or d _1, (referred to as "noisy sample" in this case) configuration frequency coefficients of the spectral estimates and those not encoded, tentatively assumed ( Step 312). Thus module 74, this structure coefficients to reflect the fact that not encoded, increasing the sample distortion value d _1. More specifically,
Module 74, all square equal to d ₁ is less than or d ₁ (2 squared) following equation sum D _NT spectral estimate (Step 314).

モジュール74は、次に、集合的許容歪みD_Tから前記和
を減算した（ステップ316）結果を、スペクトル推定値
の残数Ｎで除し、調整されたサンプル歪み値d₁を計算す
る（ステップ318）。Module 74, then, by subtracting the sum from the collective allowable distortion D _T a (step 316) results, divided by the remaining number N of spectrum estimation value to calculate the adjusted sample distortion value d ₁ (step 318).

d₁＝（D_T−D_NT）/N （10）式中Ｎは、初期歪み値d₁よりも大きな平方（２乗）ス
ペクトル推定値の数である。d₁は、その初期値d₁よりも
今や大きいことがありうるので、モジュール74は、各々
の平方（２乗）スペクトル推定値を新しいd₁と比較し、
他のいずれの係数が符号化されないか否かを定める（ス
テップ320）。その場合、推定器は、調整されたサンプ
ル歪み値d₁を計算する工程を反復する（ステップ322、3
24）。調整されたサンプル歪み値d₁よりも小さいか又は
これに等しい追加の平方（２乗）スペクトル係数がなく
なった時にサーチは終了する（ステップ320）。このサ
ーチは、サーチの許容される最大回数の後にも終了する
（ステップ322）。結果として得られるサンプル歪み値d
₁は、次に、ビットレート比較器76（図７）に供給され
る（ステップ326）。d ₁ = (D _T −D _NT ) / N (10) where N is the number of square (square) spectral estimates larger than the initial distortion value d ₁ . Since d ₁ can now be larger than its initial value d ₁ , module 74 compares each squared (squared) spectral estimate with the new d ₁ ,
It is determined whether any other coefficients are not coded (step 320). In that case, the estimator repeats the step of calculating the adjusted sample distortion value d ₁ (steps 322, 3
twenty four). Search when adjusted smaller to or equal to the additional square than the sample distortion value d ₁ (the square) spectral coefficients has run is terminated (step 320). The search also ends after the maximum number of searches allowed (step 322). The resulting sample distortion value d
₁ is then provided to bit rate comparator 76 (FIG. 7) (step 326).

再び図７を参照して、比較器76は、第１サンプル歪み
値d₁及び対数スペクトル推定値Log₂S_q ²（ｊ）に基づい
て次式により、フレーム当りの暫定的なビット数ｒを計算す
る。Referring again to FIG. 7, the comparator 76 calculates the following equation based on the first sample distortion value d ₁ and the log spectrum estimated value Log ₂ S _q ² (j). , The provisional bit number r per frame is calculated.

比較器76は、次に推定されたビット数を、フレーム当
り最大許容ビット数r_maxと比較する。ｒが最大数r_maxよ
り小さければ比較器76は、第１サンプル歪み値d₁に基づ
いて周波数係数のステップサイズを計算することをもジ
ュール78に通知する。しかしｒが最大数r_maxを超過した
場合、比較器76は、ビット数をr_maxより小さく保つため
に、推定値当りのより大くの歪みを許容する必要がある
ものと想定する。従って比較器80は、r_maxよりも小さい
ビットレートを与える新しい歪み値d₂の反復サーチを開
始するように第２歪み近似モジュール80に通知する。Comparator 76 then compares the estimated number of bits to the maximum allowable number of bits per frame r _max . comparator 76 if r is smaller than the maximum number r _max is also notifies the Joule 78 to calculate the step size of the frequency coefficients based on the first sample distortion value d _1. However, if r exceeds the maximum number r _max , comparator 76 assumes that more distortion per estimate needs to be allowed to keep the number of bits less than r _max . Thus comparator 80 notifies the second distortion approximation module 80 to initiate an iterative search for a new distortion value d ₂ which gives a smaller bit rate than r _max.

許容歪みd₂の第２推定図９を参照して、近似モジュール80は、次式を満足する歪み増分値D_iを最初に計算する（ステップ41
0）。Second Estimation of Allowable Distortion d ₂ Referring to FIG. 9, the approximation module 80 calculates Initially computing the distortion increment value D _i that satisfies (Step 41
0).

歪み増分値D_iは、ビットレートを最大値R_max以下に減
少させるために第１サンプル歪み値において必要とされ
る増分の推定値である。そのため近似モジュールは、次
式 Log₂d₂＝D_i＋Log₂d₁ （13）を満足する新しい歪み値d₂を計算する（ステップ41
2）。Strain increment value D _i is an estimate of the increment required in the first sample distortion value in order to reduce the bit rate below the maximum value R _max. Therefore, the approximation module calculates a new distortion value d ₂ that satisfies the following equation: Log ₂ d ₂ = D _i + Log ₂ d ₁ (13) (step 41)
2).

近似モジュールは、前記と同様にして、各々の平方
（２乗）スペクトル推定値S_q ²（ｊ）をd₂と比較し、ど
の推定値が歪みよりも小さいか又歪みに等しいかを定め
ることによって、どの周波数係数が符号化されないかを
予測する（ステップ414）。モジュール80は、この予測
に基づいて、フレームについて必要とされる全ビット数
ｒを、次式に従って再び計算する（ステップ416）。Approximation module, the same manner as described above, each of the square (the square) spectrum estimate S _q ² a (j) as compared to d _2, that defines which estimated value is equal to or smaller The strain than the strain , Which frequency coefficient is not coded (step 414). Based on this prediction, module 80 calculates the total number of bits r required for the frame as (Step 416).

モジュール80は、レートｒを最大値r_maxと比較して、
新しい歪み値d₂がこの最大値よりも小なビットレートを
与えるか否かを定める（ステップ418）。そうであった
場合、モジュール80は、値d₂をモジュール78に供給し、
値d₂,d₁の両方に基づいて量子化ステップサイズＱ
（ｊ）を計算することをモジュール78に指示する（ステ
ップ422）。そうでなければ、モジュール80は、十分に
低いビットレートｒを与える歪みd₂を見出す試みにおい
て、前記工程の別の反復を行う（ステップ418、420〜42
8）。しかし、このような歪み推定を見出すことなく、
最大数の反復が試みられた場合、サーチは終了され、d₂
の最新値がモジュール78に供給される（ステップ420、4
22）。Module 80 compares the rate r with a maximum value r _max ,
New distortion value d ₂ is determined whether or not give a small bit rate than the maximum value (step 418). If so, module 80 supplies the value d ₂ to module 78,
Quantization step size Q based on both values d ₂ and d ₁
The module 78 is instructed to calculate (j) (step 422). Otherwise, module 80, in an attempt to find a distortion d ₂ which gives a sufficiently low bit rate r, performs another iteration of the process (step 418,420～42
8). However, without finding such a distortion estimate,
If the maximum number of iterations has been attempted, the search is terminated and d ₂
Is supplied to the module 78 (steps 420 and 4).
twenty two).

歪み推定値からのステップサイズ及び分類情報の計算図10を参照して、モジュール78は、許容サンプル歪み
値d₁,d₂の２つの推定値を結合することによって、次式
を満足する重みつき歪みd_wを形成する。Calculating Step Size and Classification Information from Distortion Estimates Referring to FIG. 10, module 78 combines weighted estimates of allowable sample distortion values d ₁ , d ₂ by weighting to satisfy Form the distortion d _w .

Log₂d_w＝a log₂d₂＋（１−ａ）log₂d₁ （15）ここにａは、一定の重み付け係数である（ステップ51
0）。（注：値d₂が計算されなかった場合、重み付け歪
みd_wは、第１推定値d₁に等しいとおかれる）。モジュー
ル78は、重みづけ歪み推定値d_wに基づいて、各々のスペ
クトル推定Ｓ（ｊ）のためのクラスパラメータｃ′
（ｊ）を次式に従って計算する（ステップ512）。 _{_{_{Log 2 d w = a log 2}}} d 2 + (1-a) log 2 d 1 (15) where the a is a constant weighting factor (step 51
0). (Note: If the value d ₂ was not calculated, weighted distortion d _w is placed equal to the first estimate d _1). Module 78 provides a class parameter c ′ for each spectral estimate S (j) based on the weighted distortion estimate d _w.
(J) is calculated according to the following equation (step 512).

クラスパラメータｃ′（ｊ）は、０〜８の範囲内の最
も近い整数値に、上側に丸められ、クラス整数ｃ（ｊ）
を形成する（ステップ520）。０のクラス値は、該クラ
ス中の全ての係数が符号化されるべきでないことを示し
ている。従って、どの係数を符号化すべきかを指示する
ために、各々のスペクトル推定値のためのクラス値ｃ
（ｊ）が量子化器に供給される。 The class parameter c '(j) is rounded upward to the nearest integer value in the range 0-8, and the class integer c (j)
Are formed (step 520). A class value of 0 indicates that all coefficients in the class should not be coded. Therefore, the class value c for each spectral estimate to indicate which coefficients to encode
(J) is supplied to the quantizer.

１よりも大か又は１に等しいクラス値は、量子化係数
を符号化するために用いられるハフマンテーブルを選択
するために用いられる。従って、各々のスペクトル推定
値のためのクラス値ｃ（ｊ）はハフマン符号化器47に供
給される。Class values greater than or equal to one are used to select the Huffman table used to encode the quantized coefficients. Accordingly, the class value c (j) for each spectral estimate is provided to Huffman encoder 47.

モジュール78は、次に、重み付け歪d_wの値と推定値の
ためのクラスパラメータｃ′（ｊ）の値とに基づいて、
各々のスペクトル推定についての帯域ステップサイズＱ
（ｊ）を次に計算する（ステップ514〜518）。より詳細
には、クラス値が7.5よりも小なスペクトル推定値につ
いては、ステップサイズＱ（ｊ）は、次式を満足するよ
うに計算される。Module 78 then calculates, based on the value of the weighted distortion d _w and the value of the class parameter c ′ (j) for the estimate,
Band step size Q for each spectral estimate
(J) is calculated next (steps 514 to 518). More specifically, for spectral estimates with class values less than 7.5, the step size Q (j) is calculated to satisfy the following equation:

ここにｚは、一定のオフセット値、例えば−1.47156
である（ステップ516）。クラスパラメーターｃ′
（ｊ）が7.5よりも大か又は7.5に等しいスペクトル係数
については、ステップサイズは、次式を満足するように
選択される（ステップ518）。 Where z is a constant offset value, for example -1.47156
(Step 516). Class parameter c '
For spectral coefficients where (j) is greater than or equal to 7.5, the step size is selected to satisfy the following equation (step 518).

次に、各々の帯域ｊについてのバンドステップサイズ
Ｑ（ｊ）の補間によって、帯域ｊ中の各々の周波数係数
ｋについてステップサイズＱ（ｋ）を導出する。第１
に、各々の帯域ステップサイズは、下向きにスケーリン
グされる。これについては、各々の帯域ｊについてのス
ペクトル推定値Ｓ（ｊ）が、１）帯域中の全ての平行（２乗）された係数の和と、２）10個の隣接するサンプル及びブロック中の全ての係
数の論理ORの２倍とのうちどちらか大きい値として計算
されたことを想起されたい（図２（ａ）〜２（ｃ）参
照）。そのため、選定されたスペクトル推定値は、全帯
域の合計エネルギーにほぼ等しくなる。しかし、各々の
係数のための量子化ステップサイズは、各々の帯域内の
係数当りの平均エネルギーに基づいて選定されるべきで
ある。 Next, a step size Q (k) is derived for each frequency coefficient k in band j by interpolation of a band step size Q (j) for each band j. First
First, each band step size is scaled downward. For this, the spectral estimate S (j) for each band j is: 1) the sum of all parallel (squared) coefficients in the band, and 2) the 10 adjacent samples and blocks in the block. Recall that this was calculated as the larger of the two logical ORs of all coefficients (see FIGS. 2 (a) -2 (c)). Therefore, the selected spectral estimate is approximately equal to the total energy of all bands. However, the quantization step size for each coefficient should be chosen based on the average energy per coefficient in each band.

従って、帯域ステップサイズＱ（ｊ）は、その帯域ス
ペクトル推定値の計算に用いた係数の数（側ち、Ｓ
（ｊ）の計算に選択された手法に従って、10又は20）で
割算することによって、下方にスケーリングされる。次
にビットレート推定器42は、係数ステップサイズlog₂Q
（ｋ）の対数を計算するために、対数帯域ステップサイ
ズLog₂Q（ｊ）の間を直線補間する（図11）。最後に係
数ステップサイズの逆数が次式によって導出される。Therefore, the band step size Q (j) is determined by the number of coefficients (S,
It is scaled down by dividing by 10 or 20) according to the technique chosen for the calculation of (j). Next, the bit rate estimator 42 calculates the coefficient step size log ₂ Q
In order to calculate the logarithm of (k), linear interpolation is performed between the logarithmic band step sizes Log ₂ Q (j) (FIG. 11). Finally, the reciprocal of the coefficient step size is derived by:

1/Q（ｋ）−＝（log₂Q（ｊ））^-1 （19）周波数係数の量子化及び符号化図１（ａ）を参照して、クラス整数ｃ（ｊ）と量子化
ステップサイズＱ（ｋ）とは、係数量子化器44に供給さ
れる。係数量子化器44は、関連した逆ステップサイズ1/
Q（ｋ）を用いて各々の周波数係数Ｆ（ｋ）を量子化し
指数Ｉ（ｋ）を生成するミッド・トレッド（mid−trea
d）量子化器である。指数Ｉ（ｋ）は、リミットコント
ローラ45及びハフマン符号化器47の組合わされた作用に
よって、伝送のために符号化される。1 / Q (k) − = (log ₂ Q (j)) ⁻¹ (19) Quantization and coding of frequency coefficient Referring to FIG. 1A, class integer c (j) and quantization step size Q (k) is supplied to the coefficient quantizer 44. The coefficient quantizer 44 has an associated inverse step size 1 /
A mid-trea that quantizes each frequency coefficient F (k) using Q (k) to generate an index I (k)
d) It is a quantizer. Exponent I (k) is encoded for transmission by the combined action of limit controller 45 and Huffman encoder 47.

ハフマン符号化器47は、各々のクラスｃ（ｊ）につい
て、係数のハフマン符号を収容したハフマンテーブルを
備えている。クラス整数ｃ（ｊ）は、特定されたクラス
のための適切なハフマンテーブルを選定するために、ハ
フマン符号化器47に供給される。The Huffman encoder 47 has a Huffman table containing Huffman codes of coefficients for each class c (j). The class integer c (j) is provided to a Huffman encoder 47 to select an appropriate Huffman table for the specified class.

リミットコントローラ49は、指数Ｉ（ｋ）に応答した
選定されたハフマンテーブルへのエントリを特定する対
応するハフマン指数I_h（ｋ）を生成する。ハフマン符号
化器47は、次に、遠端への伝送のためにビットストリー
ムコントローラ20に、選定されたハフマン符号F_h（ｋ）
を伝送する。The limit controller 49 generates a corresponding Huffman index I _h (k) that identifies an entry in the selected Huffman table in response to the index I (k). The Huffman encoder 47 then sends the selected Huffman code F _h (k) to the bit stream controller 20 for transmission to the far end.
Is transmitted.

典型的には、リミットコントローラ45は、ハフマン係
数I_h（ｋ）として使用するための指数Ｉ（ｋ）を単に伝
送する。しかし、可能な指数Ｉ（ｋ）の範囲は、対応す
るハフマンテーブルの入力レンジを超過することがあ
る。このため、リミットコントローラ45が、各々のクラ
スｃ（ｊ）について、最大の指数値と最小の指数値とを
備えている。リミットコントローラ45は、各々の指数Ｉ
（ｋ）をこれらの最大及び最小の指数値と比較する。Ｉ
（ｋ）が最大と最小の指数値のどちらか超過した場合、
リミットコントローラ45は、それぞれの最大値及び最小
値に等しくなるように、Ｉ（ｋ）をクリップし、このク
リップされた指数を対応のハフマン指数として符号化器
47に供給する。Typically, limit controller 45 simply transmits index I (k) for use as Huffman coefficient I _h (k). However, the range of possible indices I (k) may exceed the input range of the corresponding Huffman table. For this reason, the limit controller 45 has a maximum exponent value and a minimum exponent value for each class c (j). The limit controller 45 determines each index I
Compare (k) with these maximum and minimum exponent values. I
If (k) exceeds either the maximum or minimum exponent value,
The limit controller 45 clips I (k) to be equal to the respective maximum and minimum values, and uses this clipped exponent as the corresponding Huffman exponent.
Supply 47.

リミットコントローラ45は、各々のフレームについ
て、ハフマン符号を伝送するのに必要なビット数のラン
ニングタリーを保持している。より詳細には、リミット
コントローラは、ハフマン符号化器47の各々のハフマン
テーブルについて、対応するビット数のテーブルを備え
ている。ビット数テーブルへの各々のエントリは、符号
化器47のハフマンテーブルに格納された対応するハフマ
ン符号のビット数を表わしている。従って、リミットコ
ントローラ45によって生成された各々のハフマン指数I_h
（ｋ）について、リミットコントローラは、ハフマン指
数I_h（ｋ）によって特定された対応のハフマン符号F
_h（ｋ）を伝送するのに必要なビット数を定めるため
に、ハフマン指数をビット数テーブルに内部で供給す
る。ビット数は、次に、ランニングタリーに付加され
る。ランニングタリーが最大許容ビット数を超過した場
合、リミットコントローラ45は、残りの指数Ｉ（ｋ）を
無視する。リミットコントローラ45は、次に、フレーム
の許容ビット数に到達し、残りの係数は伝送のために符
号化されないことを、遠端の受信器に通知するために、
一義的なハフマン符号を特定する一義的なハフマン符号
を用意する。The limit controller 45 holds a running tally of the number of bits necessary for transmitting the Huffman code for each frame. More specifically, the limit controller has a table of the number of bits corresponding to each Huffman table of the Huffman encoder 47. Each entry in the bit number table represents the number of bits of the corresponding Huffman code stored in the Huffman table of encoder 47. Therefore, each Huffman index I _h generated by the limit controller 45
For (k), the limit controller determines the corresponding Huffman code F identified by the Huffman index I _h (k)
_The Huffman exponent is provided internally to a bit number table to determine the number of bits required to transmit _h (k). The number of bits is then added to the running tally. If the running tally exceeds the maximum number of allowed bits, the limit controller 45 ignores the remaining index I (k). The limit controller 45 then informs the far-end receiver that the allowed number of bits in the frame has been reached and that the remaining coefficients are not encoded for transmission.
A unique Huffman code for identifying a unique Huffman code is prepared.

この一義的なハフマン符号を伝送するために、リミッ
トコントローラは複数ビットを割当てなければならな
い。このため、リミットコントローラは、最も最近のハ
フマン符号を捨て、一義的なハフマン符号を伝送するに
足るビット数が使用可能か否かを定めるために、ランニ
ングタリーを再計算する。足りないときは、リミットコ
ントローラは、一義的なハフマン符号の伝送のために十
分な数のビットが割当てられるまで、最も最近のハフマ
ン符号を捨てる。To transmit this unique Huffman code, the limit controller must allocate multiple bits. For this reason, the limit controller discards the most recent Huffman code and recalculates the running tally to determine if enough bits are available to carry a unique Huffman code. If not, the limit controller discards the most recent Huffman code until a sufficient number of bits are allocated for the transmission of a unique Huffman code.

遠端においてのマイクロホン信号の再構成再び図１（ｂ）を参照して、遠端のオーディオデコー
ダ28は、符号化された信号の組から、マイクロホン信号
を再構成する。より詳しくは、ハフマンデコーダ25は、
符号化された対数スペクトル推定値S_e（ｊ）を再構成す
るためにハフマン符号S_h（ｊ）をデコードする。デコー
ダ27（図１（ａ）のオーディオ符号化器16のデコーダ41
と同じもの）は、符号化された対数スペクトル推定値を
更にデコードして、量子化されたスペクトル推定値Log₂
S_q ²（ｊ）を再構成する。Reconstruction of Microphone Signal at Far End Referring again to FIG. 1 (b), the audio decoder 28 at the far end reconstructs a microphone signal from a set of encoded signals. More specifically, the Huffman decoder 25
Decoding Huffman codes S _h a (j) to reconstruct the encoded logarithmic spectrum estimate S _e (j). The decoder 27 (the decoder 41 of the audio encoder 16 in FIG. 1A)
) Further decodes the encoded logarithmic spectral estimate to produce a quantized spectral estimate Log ₂
Reconstruct S _q ² (j).

対数スペクトル推定値Log_jS_q ²（ｊ）及び受信した信
号指数A_i（Ｆ）は、近端のビットレート推定器42が行う
ステップサイズＱ（ｋ）及びクラスｃ（ｊ）の導出を再
び行うためにビットレート推定器46に供給される。導出
されたクラス情報ｃ（ｊ）は、ハフマン符号F_h（ｋ）を
デコードするためにハフマンデコーダ47に供給される。
ハフマンデコーダ47の出力は、係数再構成モジュール48
に供給され、モジュール48は、導出された量子化ステッ
プサイズＱ（ｋ）に基づいて、元の係数F_q（ｋ）を再構
成する。ビットレート推定器46は、更に、係数充填（co
efficient fill−in）モジュール50にクラス情報を供給
し、どの係数が伝送のために符号化されなかったかを係
数充填モジュール50に通知する。モジュール50は、次
に、再構成された対数スペクトル推定値Log₂S_q ²（ｊ）
を用いて、失われた（missing）係数を推定する。The log spectral estimate Log _j S _q ² (j) and the received signal index A _i (F) are used again to derive the step size Q (k) and class c (j) performed by the near end bit rate estimator 42. It is provided to the bit rate estimator 46 for performing. The derived class information c (j) is supplied to the Huffman decoder 47 for decoding the Huffman code F _h (k).
The output of the Huffman decoder 47 is output to a coefficient reconstruction module 48.
, And reconstructs the original coefficient F _q (k) based on the derived quantization step size Q (k). The bit rate estimator 46 further performs coefficient filling (co
It supplies class information to the efficient fill-in) module 50 and informs the coefficient filling module 50 which coefficients were not coded for transmission. Module 50 then performs a reconstructed log spectral estimate Log ₂ S _q ² (j)
Is used to estimate the missing coefficient.

最後に、デコードされた係数F_q（ｋ）と推定された係
数F_e（ｋ）とは、信号形成器52に供給され、信号形成器
52は、係数値を再び時間量入に戻し、再構成された正規
化利得g_qを用いて時間領域信号を脱正規化する。Finally, the decoded coefficient F _q (k) and the estimated coefficient F _e (k) are supplied to the signal
52, again the amount of time back to input coefficient values is de normalize the time-domain signal using the reconstructed normalization gain g _q.

より詳しくは、逆DCTモジュール51は、デコードされ
た係数と推定された係数F_q（ｋ）、F_e（ｋ）を、マージ
し、結果として得られる周波数係数値を時間領域信号
ｍ′（ｎ）に変換する。逆正規化モジュール53は時間領
域信号ｍ′（ｎ）を元のスケールに戻す。生成したマイ
クロホン信号ｍ（ｎ）は、オーバーラップデコーダ55に
供給され、このデコーダは、オーディオ符号化器16のウ
ィンドウモジュール32によって導入された冗長なサンプ
ルを除去する（図１（ａ））。信号コンディショナ57
は、得られたマイクロホン信号を濾波し、スピーカ29を
駆動するためのアナログ信号Ｌ（ｔ）に変換する。More specifically, the inverse DCT module 51 merges the decoded coefficients with the estimated coefficients F _q (k), F _e (k), and converts the resulting frequency coefficient values into the time domain signal m ′ (n ). The inverse normalization module 53 returns the time domain signal m '(n) to the original scale. The generated microphone signal m (n) is supplied to an overlap decoder 55, which removes redundant samples introduced by the window module 32 of the audio encoder 16 (FIG. 1 (a)). Signal conditioner 57
Converts the obtained microphone signal into an analog signal L (t) for driving the speaker 29.

符号化されない係数の推定前述したように、エネルギーレベルの比較的高いエネ
ルギーを有する信号スペクトル領域では、人の耳は、オ
ーディオ信号の歪みを気付き難い。従ってビットレート
推定器42（図１）は、低エネルギーレベルの係数は微細
に、また高エネルギーレベルの係数は粗く、それぞれ量
子化するように量子化ステップサイズを調整する。この
アプローチとは明確に相反して、ビットレート推定器
は、単に、非常に低いエネルギーレベルの係数は捨て
る。Estimation of Uncoded Coefficients As mentioned above, in the signal spectral region having relatively high energy levels of energy, the human ear is less likely to notice distortion of the audio signal. Therefore, the bit rate estimator 42 (FIG. 1) adjusts the quantization step size so that the coefficients at the low energy level are fine and the coefficients at the high energy level are coarse, respectively. In sharp contrast to this approach, the bit rate estimator simply discards the coefficients at very low energy levels.

これらの係数が存在しないと、相当な量の音声上のア
ーティファクトが生じることになる。従って係数充填モ
ジュール50は、スペクトル推定値から各々の捨てられた
係数の係数推定値を作成する。この間に係数モジュール
は、符号化されない係数が存在するフレームについて信
号指数A_iのレベルを検討する。例えば信号指数が低い
（これはフレームが主に背景ノイズから成ることを示
す）と、充填モジュールは、失われた係数は背景ノイズ
を表わしていると推定する。従って充填モジュールは、
主にフレーム中の背景ノイズを目安にして係数を作成す
る。しかし、信号指数が高い（これは、フレームが主に
音声信号から成ることを示す）と、ノイズ充填モジュー
ルは、失われた係数が音声信号を表わしていると想定す
る。従って、充填モジュールは、主に失われた係数の周
波数を含む周波数帯域に対応するスペクトル推定値か
ら、係数推定値を作成する。The absence of these coefficients will result in a significant amount of audio artifacts. Therefore, the coefficient filling module 50 creates a coefficient estimate for each discarded coefficient from the spectral estimate. Factor module during this time, considering the level of the signal index A _i for frame coefficients that are not coded are present. For example, if the signal index is low (indicating that the frame mainly consists of background noise), the filling module estimates that the missing coefficients are representative of background noise. Therefore, the filling module
The coefficient is created mainly using the background noise in the frame as a guide. However, if the signal index is high (indicating that the frame mainly consists of the speech signal), the noise filling module assumes that the missing coefficients are representative of the speech signal. Thus, the filling module creates coefficient estimates from the spectral estimates corresponding to the frequency bands that mainly include the frequency of the lost coefficients.

図12を参照して、係数充填モジュール50（図１
（ｂ））は各々の帯域について、係数推定器モジュール
82を備えている。各々の推定器モジュール82は、周波数
帯域ｊ中の背景ノイズの量を各々のフレームＦについて
近似するためのノイズフロアモジュール84を備えてい
る。ノイズ推定値S_n（j,F）は、現在のフレームの対数
スペクトル指定値Log₂S_q ²（ｊ）と以前のフレームのス
ペクトル推定から導出されたノイズ推定値との比較によ
って導かれる。加算器91は、対数スペクトル推定値を対
数利得log₂g_q ²に加算して、対数スペクトル推定値を脱
正規化する。脱正規化された推定値S_u（j,F）は、S
_u（j,F）を以前のフレームＦ−１について計算されたノ
イズ推定値S_n（j,F−１）と比較する比較器99に供給さ
れる。（S_nは最初のフレームについて零に初期化されて
いる）。S_u（j,F）が以前のノイズ推定値S_n（j,F−１）
よりも大きければ、現在のフレームＦについてのノイズ
推定は、次式によって計算される。Referring to FIG. 12, coefficient filling module 50 (FIG. 1)
(B)) is a coefficient estimator module for each band.
It has 82. Each estimator module 82 includes a noise floor module 84 for approximating the amount of background noise in frequency band j for each frame F. The noise estimate S _n (j, F) is derived by comparing the log spectral designation of the current frame Log ₂ S _q ² (j) with a noise estimate derived from the spectral estimation of the previous frame. The adder 91 adds the logarithmic spectrum estimate to the logarithmic gain log ₂ g _q ² to denormalize the logarithmic spectrum estimate. The denormalized estimate S _u (j, F) is given by S
_u (j, F) is provided to a comparator 99 which compares the noise estimate _Sn (j, F-1) calculated for the previous frame F-1. (S _n has been initialized to zero for the first frame). S _u (j, F) is the previous noise estimate S _n (j, F−1)
If so, the noise estimate for the current frame F is calculated by:

S_n（j,F）＝S_n（j,F−１）＋t_r （20）ここに、t_rは、テーブル100によって与えられる立上
り時定数である。より詳しくは、テーブル100は、信号
指数A_i（Ｆ）の各々の値について一義的なt_rを供与す
る。（例えば、A_i（Ｆ）＝０については、長い時定数を
与えるように、比較的長い時定数を選択する。A_i（Ｆ）
が増大すると、選定される時定数は減少する）。S _n (j, F) = S _n (j, F−1) + t _r (20) where _tr is the rise time constant given by table 100. More particularly, table 100 may donate unambiguous t _r for each value of the signal index A _i (F). (For example, for A _i (F) = 0, a relatively long time constant is selected to give a long time constant. A _i (F)
Increases, the selected time constant decreases).

S_u（j,F）が以前のノイズ推定S_n（j,F−１）よりも小
さいと、現在のフレームのノイズは、次式によって計算
される。If S _u (j, F) is smaller than the previous noise estimate S _n (j, F−1), the noise of the current frame is calculated by the following equation.

S_n（j,F）＝S_n（j,F−１）＋t_f （21）ここにt_fは、テーブル102によって与えられる立上が
り時定数である。テーブル102は、テーブル100と同様
に、各々の指数A_i（Ｆ）の値について一義的な時定数t_f
を与える。加算器93は、対数利得log₂g² _qを減算するこ
とによって、結果として得られるノイズ推定値S_n（j,
F）を正規化する。93の出力は、次式に従って逆対数を
計算する対数インバーター64に供給され、正規化された
ノイズ推定値S_nn（j,F）を与える。S _n (j, F) = S _n (j, F−1) + t _f (21) where t _f is a rising time constant given by the table 102. Like the table 100, the table 102 has a unique time constant t _f for each index A _i (F) value.
give. Adder 93 subtracts the logarithmic gain log ₂ g ² _q to obtain the resulting noise estimate S _n (j,
F). The output of 93 is provided to a logarithmic inverter 64 that calculates the antilog according to the following equation and provides a normalized noise estimate S _nn (j, F):

各フレームについて、正規化された帯域ノイズは、S
_nn（j,F）を推定し、スペクトル推定値S_q（j,F）は、こ
れらの２値の重み付けされた和を生成する重み付け関数
86に供給される。重み付け関数86は信号インデックスA_i
（Ｆ）の各値についてノイズ重み付け係数C_n（Ｆ）を含
む第１テーブル88を含む。同様に、第２テーブル90は、
各々の指数A_i（Ｆ）について、音声重み付け指数C
_a（Ｆ）を含む。テーブル88は、オーディオ指数A
_i（Ｆ）の現在の値に応答して、対応するノイズ重み付
け係数C_n（Ｆ）を乗算器92に供給する。乗算器92は、C_n
（Ｆ）とS_nn（j,F）との積を計算して、重み付けされた
ノイズ値S_nw（j,F）を形成する。同様にテーブル90は、
重み付けされた音声値S_vw（j,F）を計算する（S_vw＝S_q
（j,F）C_a）。重み付けされた値は、次式に従って重み
の推定値を計算する形成器98に供給される。 For each frame, the normalized band noise is S
_nn (j, F) and a spectral estimate _Sq (j, F) is a weighting function that produces a weighted sum of these binary values.
Supplied to 86. The weighting function 86 is the signal index A _i
A first table 88 including a noise weighting coefficient C _n (F) for each value of (F) is included. Similarly, the second table 90
For each index A _i (F), a speech weighting index C
_a (F) is included. Table 88 shows the audio index A
_In response to the current value of _i (F), a corresponding noise weighting factor C _n (F) is provided to multiplier 92. The multiplier 92 calculates C _n
Calculate the product of (F) and S _nn (j, F) to form a weighted noise value S _nw (j, F). Similarly, table 90
Calculate the weighted speech value S _vw (j, F) (S _vw = S _q
(J, F) C _a ). The weighted values are provided to a former 98 which calculates an estimate of the weight according to the following equation:

Ｗ＝（S_vw＋S_nw）^２（23）テーブル90、88に格納された重み付け係数C_a、C_nは、
C_a＝１−C_nによって関係付けられている。ここにC_nの値
は、（A_i＝７に対する）０から、（A_i＝０に対する）１
までの範囲にある。従って、無音時は、ノイズ推定値
は、より多くの重みをもち、オーディオ指数の増大に伴
ってより多くの重みがスペクトル指数に与えられる。W = (S _vw + S _nw ) ² (23) The weighting coefficients C _a and C _n stored in the tables 90 and 88 are
It is related by C _a = 1−C _n . Where the value of C _n ranges from 0 (for A _i = 7) to 1 (for A _i = 0)
In the range up to. Thus, during silence, the noise estimate has more weight and the spectral index is given more weight as the audio index increases.

この重み付けされた推定値は、スペクトル推定値に対
応する10個の無符号周波数係数の各々について、推定さ
れた周波数係数F_e（ｋ）を計算するために用いられるよ
うに信号形成器52に供給される。This weighted estimate is provided to the signal shaper 52 for use in calculating the estimated frequency coefficient F _e (k) for each of the ten unsigned frequency coefficients corresponding to the spectral estimate. Is done.

より詳しくは、重み付けされた推定値Ｗは、各々の失
われた周波数係数（即ち、０クラスをもつもの）につい
て「充填（fill−in）」レベルを制御するために用いら
れる。充填は、（一様な分布をもった）乱数発生器から
の出力をスケーリングし、失われた周波数係数の代りに
結果F_eを挿入することから成る。次式は、各々の失われ
た周波数係数についてフィルインを発生させるために用
いられる。More specifically, the weighted estimate W is used to control the “fill-in” level for each missing frequency coefficient (ie, one with 0 class). Filling consists of inserting the result F _e instead of scaling the outputs from the (uniform distribution with) a random number generator, lost frequency coefficients. The following equation is used to generate a fill-in for each missing frequency coefficient.

上式においてノイズは、与えられた瞬時点においての
乱数発生器からの出力（又は係数ｙサンプル）であり、
乱数発生器の範囲は、値ｎの２倍である。ｅは定数例え
ば３である。各々の失われている周波数係数について新
しいノイズ値が発生される。 Where noise is the output (or coefficient y samples) from the random number generator at a given instantaneous point,
The range of the random number generator is twice the value n. e is a constant, for example, 3. A new noise value is generated for each missing frequency coefficient.

本発明の好ましい特性の実施例の追加、削除又は他の
変更は、当業者にとっては自明であり、下記の請求の範
囲に包含される。Additions, deletions, or other modifications of the preferred feature embodiments of the present invention will be apparent to those skilled in the art and are encompassed by the following claims.

フロントページの続き (56)参考文献特開昭56−66929（ＪＰ，Ａ) 特開昭64−24513（ＪＰ，Ａ) 特開昭63−151269（ＪＰ，Ａ) 特開昭56−46300（ＪＰ，Ａ) 特公昭60−37658（ＪＰ，Ｂ１) 特公平２−52280（ＪＰ，Ｂ２) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 19/00 Continuation of the front page (56) References JP-A-56-66929 (JP, A) JP-A-64-24513 (JP, A) JP-A-63-151269 (JP, A) JP-A-56-46300 (JP, A) , A) JP-B 60-37658 (JP, B1) JP-B 2-52280 (JP, B2) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 19/00

Claims

(57) [Claims]

1. A method for allocating transmission bits used to transmit samples of a digital signal, the method comprising: collective allowable quantization representing a collective allowable quantization distortion error for a frame comprising a plurality of samples of the digital signal. Selecting a distortion value; selecting a set of samples from the frame of samples such that each of the set of samples is greater than a noise threshold; and for each sample of the set, Calculating a sample quantization distortion value representing an allowable quantization distortion error of the sample;
Sample quantizing the set of all samples such that the sum of the distortion values is approximately equal to the collective allowable quantization distortion value; and for each sample of the set, the corresponding quantum of the samples A quantization step size that gives a quantization distortion error substantially equal to the quantization distortion value, and quantizing the sample using the quantization step size.

2. The method according to claim 1, wherein the step of selecting a quantization step size is based at least in part on a difference between the sample and a corresponding quantization distortion value of the sample. 2. The method according to claim 1, further comprising the step of calculating.

3. The digital signal includes a noise component,
The selection of the collective allowable quantization distortion comprises creating a signal index that represents the signal magnitude of the signal component relative to the noise magnitude of the noise component for at least one sample of the frame, and based on the signal index. 2. The method according to claim 1, further comprising the steps of: selecting the collective allowable quantization distortion.

4. The method of claim 1, wherein calculating the sample quantization distortion value comprises dividing the collective allowable quantization distortion value by a number of samples in a frame of the plurality of samples to form a first sample distortion value. The allocation method according to claim 1.

5. The method according to claim 1, wherein the selection of a set of samples is such that the trial set of samples of the digital signal is more trial than a noise threshold defined at least in part by a value of the first sample distortion value. Selecting each sample of the set to be large, wherein calculating the sample threshold comprises:
5. The method of claim 4, further comprising adjusting the first sample distortion value by a value determined by a difference between the first sample distortion value and at least one sample removed from the trial set. The assignment method described.

6. The trial selection of a set of samples and the adjustment of the first sample distortion value include the steps of: a) adjusting the first sample distortion value to a value less than a current value of the first sample distortion value; Identifying the noisy samples in the set; b) removing any noisy samples from the trial set, if any; c) determining if the noisy samples are removed. The first sample distortion value also increases the first sample distortion value by an amount determined by a difference between at least one of the noisy samples, the trial sample value being greater than the adjusted first sample distortion. A method comprising: repeating steps a), b) and c) of the set of additional noisy samples until the adjusted first sample distortion value is reached such that step a) is no longer specified. 6. The allocation method according to claim 5, wherein:

7. The method of claim 6, further comprising the step of aborting said adjustment if steps a), b) and c) are repeated a certain maximum number of times.

8. After the adjustment is complete, estimate the number of bits for the amount of bits required to transmit the trial set of all samples, and compare the estimated number of bits to a maximum number of bits. And further comprising, if the estimated number of bits is less than or equal to the maximum number of bits, selecting a final noise threshold based on the adjusted first sample distortion value. 9. The allocation method according to claim 7, wherein:

9. A second sample distortion value is generated if the estimated number of bits exceeds the maximum number of bits; d) a plurality of samples of the digital signal of a second trial set. And selecting each of the plurality of samples to have a value greater than the second sample distortion value; e) the number of bits required to transmit all the samples of the second trial set. E) comparing said estimated number of bits to said maximum number of bits; g) said second number by an amount determined by said selection if said estimated number of bits is greater than said maximum number of bits. 9. The method according to claim 8, further comprising the steps of: increasing a sample distortion value.

10. Repeating steps d) to g) until a second sample distortion value is reached as determined by step g that said estimated number of bits is less than or equal to said maximum number of bits. 10. The method of claim 9, further comprising the step of:

11. The sampled distortion value is calculated from the adjusted first sampled distortion value and the second sampled distortion value, a final set of digital signal samples is selected, and the final set of a plurality of samples is selected. 11. The method according to claim 10, further comprising the step of: having each value of each of said first and second threshold values be greater than a certain final threshold value determined by a corresponding sample distortion value of said sample.

12. A method for communicating a digital signal including a noise component and a signal component, comprising: generating an estimated signal having a smaller number of samples than the digital signal representing the digital signal; Creating a signal index representing the magnitude of the signal component signal for at least one sample of the estimated signal; selecting a sample of the digital signal having a sufficiently large signal component based on the signal index; A communication method for transmitting the selected sample of the digital signal, transmitting the sample of the digital estimation signal, and reconstructing the digital signal from the transmitted selected sample and the estimation signal.

13. The digital signal is a frequency domain audio signal representing audio information to be transmitted, and each sample of the estimated signal is a spectrum estimate of a frequency domain signal of a corresponding frequency band. Reconstructing a domain audio signal, generating a random number for each unselected sample of the frequency domain audio signal, creating a noise estimate of the value of the noise component of the unselected sample for at least one of the spectral estimates Generating a scaling factor based on the noise estimate; and scaling the random number according to the scaling factor to form a reconstructed sample representing the unselected sample. The communication method described.

14. The estimation signal and the frequency domain signal each include a series of frames, each frame representing the audio information over a specified time window;
A first noise estimate of a current frame comprising: for a certain previous frame of the estimated signal, creating an initial noise estimate from at least one spectral estimate representing a frequency band of the previous frame; based on, select the rise time constant t _r and fall time constant t _f, to form a upper threshold by adding the upstanding uplink time constant outermost first noise estimate, the outermost first the upstanding edge time constant Subtracting from the noise estimate to form a lower threshold, comparing a current spectral estimate of the current frame, representing the frequency band, with the upper and lower thresholds, wherein the current spectral estimate is the threshold If the current noise estimate is equal to the current spectral estimate, and if the current spectral estimate is less than the lower threshold, Subtracting the noise estimate by the falling time constant to form an estimate of the current noise; if the current spectral estimate is greater than the upper threshold, reducing the first noise estimate by the rising time constant 14. The communication method according to claim 13, comprising the steps of incrementing to form the current noise estimate.

15. The method of claim 14, wherein generating a scaling factor includes generating a weighted sum of the current noise estimate and the current spectral estimate.

16. The generation of the weighted sum comprises: selecting a noise factor based on the value of the signal index; selecting a speech factor based on the value of the signal index; 16. The method according to claim 15, further comprising: multiplying a noise coefficient, multiplying the current spectrum estimation value by the audio coefficient, adding the multiplication result, and forming the scaling factor from a result of the addition. Communication method.

17. An encoding device for allocating transmission bits used for transmitting a plurality of samples of a digital signal, the apparatus comprising: Means for selecting a total allowable quantization value to represent; and means for selecting a set of samples from the frame of samples such that each of the set of samples is greater than a noise threshold. And for each of the set of samples, a sample quantization distortion value representing an allowable quantization distortion error of the sample, and for each of the set of samples, all sample quantizations of the set of all samples. Means for calculating the sum of the distortion values to be approximately equal to the collective allowable quantization distortion; and, for each sample of the set, the sample Encoding means for selecting a quantization step size that gives a quantization distortion error approximately equal to the corresponding quantization distortion value of the sample, and means for quantizing the samples using the quantization step size. .

18. The method according to claim 18, wherein said means for selecting a quantization step size comprises: for each sample to be transmitted, a quantization step size that is at least partially equal to a difference between said sample and a corresponding quantization distortion value of said sample. 18. The encoding apparatus according to claim 17, further comprising means for performing calculation based on the target.

19. The digital signal includes a noise component and a signal component, wherein the means for selecting a collective allowable quantization distortion comprises: a signal magnitude of the signal component with respect to a noise magnitude of the noise component. At least one of the frames
18. The encoding apparatus according to claim 17, further comprising: means for generating a signal index representing one sample; and means for selecting the collective allowable quantization distortion based on the signal index.

20. The means for calculating the sample quantization distortion value, wherein the collective allowable quantization distortion value is divided by a number of samples in a frame of the plurality of samples to obtain a first sample distortion value. Claims including means for forming
Item 18. The encoding device according to Item 17,

21. The means for selecting a set of the plurality of samples includes: a trial set of a plurality of samples of the digital signal; Means for selecting to be greater than a noise threshold at least partially determined by a value of the first sample distortion value, wherein the means for calculating the sample quantization distortion value comprises: 21. The encoding method according to claim 20, further comprising: means for adjusting the first sample distortion value by an amount determined by a difference between the first sample distortion value and at least one sample excluded from the trial set. .

22. The means for selecting a trial set of samples and the means for adjusting the first sample distortion value, wherein each of the means has a value less than a current value of the first sample distortion value. Means for identifying a trial set of noisy samples; means for removing the noisy samples, if any, from the trial set; and, if the noisy samples are removed, the first sample distortion value. Means for increasing the first sample distortion value by an amount determined by a difference between at least one of the noisy samples and at least one of the additional noisy samples of the trial set rather than the adjusted first sample distortion value. Until the adjusted first sample distortion value is reached such that it does not increase
22. The encoding apparatus according to claim 21, further comprising: means for removing a noisy sample and adjusting the first sample distortion value.

23. The encoding apparatus according to claim 22, further comprising means for terminating said adjustment when said first sample distortion value has been adjusted a maximum number of times.

24. A means for estimating the number of bits for the amount of bits necessary to tradition all the samples of the trial set after the adjustment is completed; and estimating the estimated number of bits to a maximum number of bits. Means for comparing the number of bits with the number of bits and selecting a final noise threshold based on the adjusted first sample distortion value if the estimated number of bits is less than or equal to the maximum number of bits. 24. The encoding device according to claim 23, further comprising:

25. A means for producing a second sample distortion value if the estimated number of bits exceeds the maximum number of bits, and a second trial set of samples of the digital signal. Means for each of said second trial set of samples to have a value greater than said second sample distortion value; and transmitting said second trial set of all samples. Means for estimating the number of bits required to perform the operation, means for comparing the estimated number of bits with the maximum number of bits, and if the estimated number of bits is greater than the maximum number of bits, 25. The encoding apparatus according to claim 24, further comprising: means for increasing the second sample distortion value by a determined amount.

26. Iteratively adjusting the second sample distortion value;
Re-selecting the second set of trial samples,
Further comprising means for estimating all the samples of the trial set of E., until the estimated number of bits reaches a second sample distortion value such that the estimated number of bits is less than or equal to the maximum number of bits. 26. The encoding device according to item 25.

27. A means for calculating the sample distortion value from the adjusted first sample distortion value and the second sample distortion value; and a final set of a plurality of samples of the digital signal, 27. The encoding apparatus according to claim 26, further comprising: means for selecting each of the plurality of samples to have a value greater than a last threshold determined by a corresponding sample distortion value of the sample.

28. An apparatus for transmitting a digital signal including a noise component and a signal component, comprising: means for generating an estimated signal having a smaller number of samples than the digital signal representing the digital signal; Means for generating, for at least one sample, a signal index representing the ratio of the value of the signal component to the value of the noise component; and determining a sample of the digital signal having a signal component that is sufficiently large based on the signal index. Means for selecting; means for transmitting selected samples of the digital signal; means for transmitting the samples of the digital estimated signal; means for reconstructing the digital signal from the selected transmitted and estimated samples. An encoding device comprising:

29. The digital signal is a frequency domain audio signal representing audio information to be transmitted, and each sample of the estimated signal is a spectrum estimate of a frequency domain signal in a corresponding frequency band. Means for reconstructing the audio signal in the frequency domain, means for generating a random number of each unselected sample of the audio signal in the frequency domain, and Means for generating from the one or more spectral estimates; means for generating a scaling factor based on the noise estimate; and a reconstructed sample representing the unselected sample by scaling the random number according to the scaling factor. 29. The encoding apparatus according to claim 28, further comprising: means for generating a signal.

30. The estimation signal and the frequency-domain audio signal each include a series of frames, each of which represents the audio information over a specified time window, and includes one for the current frame. Means for generating a next noise estimate, for a previous frame of the estimated signal, means for generating an initial noise estimate from at least one spectral estimate representing a frequency band of the previous frame; based on the value of the exponent, and means for selecting a rise time constant t _r and fall time constant t _f, means for forming the threshold value of the upper by adding the upstanding uplink time constant outermost first noise estimate, the Means for subtracting the fall time constant from the initial noise estimate to form a lower threshold; and applying a current spectral estimate representing the frequency band in the current frame to the upper Means for comparing the current noise estimate with the current spectrum estimate when the current spectrum estimate is in the middle of the threshold value; and Means for reducing the initial noise estimate by the fall time constant when the value is lower than the lower threshold to form the current noise estimate; and wherein the current spectrum estimate is greater than the upper threshold. Means for incrementing said first noise estimate by said rise time constant to form said current noise estimate when said first noise estimate is also higher.

31. The encoding apparatus according to claim 30, wherein said means for generating a scaling factor includes means for generating a weighted sum of said current noise estimate and said current spectral estimate.

32. The means for generating the weighted sum comprises: means for selecting a noise coefficient based on a value of the signal index; means for selecting a voice coefficient based on a value of the signal coefficient; Means for multiplying a current noise estimate by the noise coefficient; means for multiplying the current spectrum estimate by the speech coefficient; means for adding the multiplication result; and forming the scaling factor from the addition result. 32. The encoding device according to claim 31, comprising: means.

33. An estimating method for estimating the energy of a speech component for each of a series of frames of a digital signal, comprising: for a first frequency band of a first frame, a noise energy in the first frequency band of the first frame; Creating a first frame noise estimate that represents the first frame noise estimate and adding a small increment to the first frame noise estimate when the signal energy value is greater than the first frame noise estimate; Forming a noise estimate, and if the signal energy value is less than the first frame noise estimate, subtracting a large decrement from the first frame noise estimate to form a second frame noise estimate Subtracting the second frame noise estimate from the value of the code energy in the first frequency band, An estimation method comprising: forming a speech estimate that represents energy of a speech component of a first frequency band.

34. For each of the plurality of frequency bands of the second frame to create an estimate of speech for each speech estimate by subtracting the voice estimate from the threshold S _T, bandwidth estimation value S to form a _out, selects the maximum bandwidth estimate S _out of the maximum bandwidth estimation value S _max, based on the maximum bandwidth estimation value S _max, the frame speech representing the energy of the entire band of the speech component of the second frame 34. The estimation method according to claim 33, further comprising each step of creating an estimated value.

35. The method according to claim 35, wherein generating the frame estimate comprises the step of setting the frame speech estimate equal to zero if the maximum bandwidth estimate S _max is less than or equal to zero.
The estimation method according to paragraph 33.

36. The sample of each frame is normalized by a frame normalization factor g, and the creation of a signal energy value representing the energy of the digital signal is such that the denormalized signal is the same as said first frame noise estimate. 34. The method of claim 33, comprising denormalizing the second frame of the digital signal to be on a scale.

37. A method for generating a frame speech estimate comprising: multiplying said maximum bandwidth estimate S _max by a weighting factor to form a _skateled value S ′ _max , representing a scaled value of a normalized gain g. create an attenuated gain g _o, if said maximum bandwidth estimation value S _max is greater than 0, S _'max and g _o
37. The method according to claim 36, further comprising the step of: selecting the larger one of the above as the frame sound estimation value.