JP4618823B2

JP4618823B2 - Signal encoding apparatus and method

Info

Publication number: JP4618823B2
Application number: JP30150498A
Authority: JP
Inventors: 淳松本; 正之西口; 堅一牧野
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1998-10-22
Filing date: 1998-10-22
Publication date: 2011-01-26
Anticipated expiration: 2018-10-22
Also published as: JP2000132195A

Abstract

PROBLEM TO BE SOLVED: To enhance the encoding efficiency by removing the characteristic or correlativity of a signal by LPC(linear predicted coding) analysis or pitch analysis to the signal waveform of a time base prior to orthogonal transformation in an encoding using orthogonal transformation. SOLUTION: The time base input signal from an input terminal 10 is transmitted to a normalization circuit part 11 and an LPC analyzing circuit 39. The normalization circuit 11 removes the correlation of the signal waveform by an LPC reverse filter 12 and a pitch reverse filter 13 to take the residue, which is then transmitted to an orthogonal transformation circuit 25. The LPC coefficient from the LPC analyzing circuit 39 and the pitch parameter from a pitch analyzing circuit 15 are transmitted to a bit assignment calculation circuit 41. A coefficient quantization part 40 quantizes the coefficient from the orthogonal transformation circuit 25 according to the assigned bit number from the bit assignment calculation circuit 41.

Description

【０００１】
【発明の属する技術分野】
本発明は、入力信号を時間軸／周波数軸変換して量子化を行う信号符号化装置及び方法に関し、特に、オーディオ信号を高能率符号化する場合に好適な信号符号化装置及び方法に関する。
【０００２】
【従来の技術】
従来において、オーディオ信号（音声信号や音楽信号を含む）の時間領域や周波数領域における統計的性質と人間の聴感上の特性を利用して信号圧縮を行うような符号化方法が種々知られている。この符号化方法としては、大別して時間領域での符号化、周波数領域での符号化、分析合成符号化等が挙げられる。
【０００３】
【発明が解決しようとする課題】
ところで、時間軸の入力信号を周波数軸の信号に直交変換して符号化を行う変換符号化においては、入力信号の時間軸波形の特徴を除去して変換符号化することが、符号化効率を高める上で望ましい。
【０００４】
また、直交変換された周波数軸上の係数データを量子化する際に、重み付けを施してビット割当をすることが多く行われているが、このビット割当のための情報を付加情報あるいはサイドインフォメーションとして伝送するのは、ビットレートが増加することになり好ましくない。
【０００５】
本発明は、このような実情に鑑みてなされたものであり、直交変換に先立って時間軸波形信号の特徴あるいは相関性を除去し、符号化効率を高めることができると共に、量子化の際のビット割当の情報を直接送らなくともデコーダ側でビット割当を再現できビットレート低減に貢献し得るような信号符号化装置及び方法の提供を目的とする。
【０００６】
【課題を解決するための手段】
本発明は、上述した課題を解決するために、時間軸上の入力信号に対して線形予測符号化（ＬＰＣ）分析及びピッチ分析を行うことにより得られた情報に基づいて残差を取り出し、量子化された時間軸上のエンベロープ値によりゲインの平滑化のためのゲインコントロールを行うことにより正規化し、この時間軸で平滑化された出力に対して直交変換を施し、この直交変換された出力を量子化する。
【０００７】
ここで、上記直交変換は、改良離散コサイン変換（ＭＤＣＴ）により入力された時間軸信号を周波数軸の係数データに変換することが好ましい。また、上記正規化は、上記入力信号をＬＰＣ分析して得られたＬＰＣ係数に基づき上記入力信号のＬＰＣ予測残差を出力し、上記ＬＰＣ予測残差をピッチ分析して得られたピッチパラメータに基づき上記ＬＰＣ予測残差のピッチの相関性を除去することが好ましい。さらに、上記量子化手段は、上記ＬＰＣ分析結果及び上記ピッチ分析結果に基づいて決定される割当ビット数に従って量子化を行うことが好ましい。
【０００８】
【発明の実施の形態】
以下、本発明に係る実施の形態について、図面を参照しながら説明する。
図１は、本発明に係る実施の形態となる信号符号化装置の概略構成を示すブロック図である。
【０００９】
この図１において、入力端子１０には時間軸上の波形信号、例えばディジタルオーディオ信号が入力される。具体的には、例えばサンプリング周波数Ｆs が１６ｋHzで０〜８ｋHz程度のいわゆる広帯域音声信号が挙げられるが、これに限定されるものではない。
【００１０】
入力端子１０からの入力信号は、正規化回路部１１に送られる。この正規化回路部１１は、白色化回路とも呼ばれ、入力された時間波形信号の特徴を抽出して予測残差を取り出すような白色化を行うものである。時間波形の白色化は、線形若しくは非線形の予測により行うことができ、例えばＬＰＣ（線形予測符号化：Linear Predictive Coding）及びピッチ分析により入力時間波形信号を白色化することができる。
【００１１】
図１の例においては、正規化（白色化）回路部１１は、ＬＰＣ逆フィルタ１２とピッチ逆フィルタ１３とから成っており、入力端子１０からの入力信号をＬＰＣ分析回路３９に送ってＬＰＣ分析し、分析の結果得られたＬＰＣ係数（いわゆるαパラメータ）をＬＰＣ逆フィルタ１２に送ってＬＰＣ予測残差を取り出すようにしている。ＬＰＣ逆フィルタ１２からのＬＰＣ予測残差は、ピッチ分析回路１５及びピッチ逆フィルタ１３に送られ、ピッチ分析回路１５では後述するようなピッチ分析によりピッチパラメータ（ピッチゲイン及びピッチラグ）が取り出されてピッチ逆フィルタ１３に送られている。ピッチ逆フィルタ１３では、上記ＬＰＣ予測残差からピッチ相関を除去してピッチ残差を求め、直交変換回路２５に送っている。また、ＬＰＣ分析回路３９からのＬＰＣ係数及びピッチ分析回路１５のピッチパラメータは、量子化の際のビット割当（ビットアロケーション）を決定するためのビット割当算出回路４１に送られている。
【００１２】
正規化回路部１１からの白色化された時間波形信号、すなわちＬＰＣ残差のピッチ残差は、時間軸／周波数軸変換（Ｔ／Ｆ mapping）処理を行う直交変換回路部２５に送られて、周波数軸の信号（係数データ）に変換される。このＴ／Ｆ変換としては、例えばＤＣＴ（離散コサイン変換：Discrete Cosine Transform）、ＭＤＣＴ（改良ＤＣＴ：Modified Discrete Cosine Transform）、ＦＦＴ（高速フーリエ変換：Fast Fourier Transform）等が多く用いられる。直交変換回路部２５から得られたＭＤＣＴ係数、ＦＦＴ係数等のパラメータあるいは係数データは、係数量子化部４０に送られて、ＳＱ（スカラ量子化）あるいはＶＱ（ベクトル量子化）等が施される。この係数量子化を効率的に行うためには、各係数に対する量子化のビット割当（ビットアロケーション）を決定する必要がある。このビット割当は、聴覚マスキングモデル、あるいは上記正規化回路部１１での白色化の際に得られるＬＰＣ係数やピッチパラメータ等の各種パラメータ、あるいは上記係数データから計算されるバークスケールファクタ等に基づいて算出することができる。なお、バークスケールファクタとしては、直交変換されて得られた係数を、人間の聴覚特性に合わせて高域ほどバンド幅が広くなるような周波数帯域、いわゆる臨界帯域（クリティカルバンド）に分割したときの、各クリティカルバンド毎のピーク値あるいはｒｍｓ（二乗平均の平方根）値等が用いられる。
【００１３】
本実施の形態においては、ＬＰＣ係数、ピッチパラメータ、及びバークスケールファクタのみによってビット割当を算出するように規定しておき、これらのパラメータのみを送ることによって、デコーダ側でエンコーダ側と同一のビット割当が再現でき、割当ビット数そのものを表す付加情報（サイドインフォメーション）を送る必要がなく、ビットレート低減に貢献できる。
【００１４】
なお、正規化回路部１１のＬＰＣ逆フィルタ１２で用いるＬＰＣ係数（αパラメータ）や、ピッチ逆フィルタ１３で用いるピッチパラメータ（のピッチゲイン）については、デコーダ側での再現性を考慮して、後述するように量子化された値を用いている。
【００１５】
この図１の信号符号化装置は、ハードウェア構成として示しているが、いわゆるＤＳＰ（ディジタル信号プロセッサ）等を用いてソフトウェア的に実現することも可能である。
【００１６】
次に、図２を参照しながら、上述した本発明の実施の形態のより具体的な構成例としてのオーディオ信号符号化装置について説明する。
【００１７】
この図２に示すオーディオ信号符号化装置は、供給された時間軸信号を、直交変換部２５で例えばＭＤＣＴ（改良離散コサイン変換）により時間軸／周波数軸変換（Ｔ／Ｆ変換）を施して周波数軸上のデータ（ＭＤＣＴ係数）とし、これを係数量子化部４０で量子化することで符号化を行うものであるが、この実施の形態においては、直交変換の前の時間軸信号に対して、ＬＰＣ分析、ピッチ分析、エンベロープ抽出等により入力信号波形の特徴を抽出し、これらの特徴を表すパラメータは別途量子化して取り出すようにし、正規化（白色化）回路部１１においてこれらの特徴を除去、あるいは信号の相関性を除去することで、白色雑音に近い、いわゆるノイズライクな信号とすることで、符号化効率を高めている。
【００１８】
また、直交変換後の係数データの量子化の際のビット割当（ビットアロケーション）の決定には、上記ＬＰＣ分析で求められたＬＰＣ係数、ピッチ分析で求められたピッチパラメータを用いている。この他、周波数軸上で臨界帯域（クリティカルバンド）毎のピーク値やｒｍｓ値等を取り出して正規化ファクタとするバークスケールファクタを用いてもよい。これらのＬＰＣ係数、ピッチパラメータ、バークスケールファクタにより、ＭＤＣＴ係数のような直交変換係数データに対する量子化時の重みを算出し、これにより全帯域のビット割当を決定して係数量子化を行う。量子化時の重み決定が、予め規定されたパラメータによってなされる場合、例えば上記ＬＰＣ係数、ピッチパラメータ、バークスケールファクタのみによってなされる場合には、これらのパラメータのみをデコーダ側に伝送するだけで、エンコーダ側と全く同じビット割当（ビットアロケーション）が再現されるため、ビット割当そのものに関する付加情報（サイドインフォメーション）を送る必要がなくなる。
【００１９】
さらに、係数量子化の際には、上記量子化時の重みあるいは割当ビット数に従った順序で係数データを並べ替え（ソート）し、順に精度の高い量子化を行うようにしている。この量子化は、ソートされた係数を先頭から順にサブベクトルに区切り、それぞれベクトル量子化を行うことが好ましい。ソートについては、全帯域の係数データに対して行ってもよいが、いくつかの帯域に区切って、それぞれの帯域の範囲内毎にソートするようにしてもよい。この場合も、ビット割当に用いられるパラメータが予め規定されていれば、そのパラメータを送るだけで、ビット割当情報やソートされた係数の位置情報等を直接送らなくても、ビット割当やソート順序等をデコーダ側で再現できる。
【００２０】
図２において、入力端子１０には、例えば、０〜８ｋHz程度のいわゆる広帯域音声信号をサンプリング周波数Ｆs ＝１６ｋHzでＡ／Ｄ変換したディジタルオーディオ信号が供給されている。この入力信号は、正規化（白色化）回路部１１のＬＰＣ逆フィルタ１２に送られると共に、例えば１０２４サンプルずつ切り出され、ＬＰＣ分析・量子化部３０に送られている。このＬＰＣ分析・量子化部３０では、ハミング窓かけをした上で、２０次程度のＬＰＣ係数、すなわちαパラメータを算出し、ＬＰＣ逆フィルタ１１によりＬＰＣ残差を求めている。このＬＰＣ分析の際には、分析の単位となる１フレーム１０２４サンプルの内の一部サンプル、例えば１／２の５１２サンプルを次のブロックとオーバーラップさせており、フレームインターバルは５１２サンプルとなっている。これは、後段の直交変換として採用されているＭＤＣＴ（改良離散コサイン変換）のエリアシングキャンセレーションを利用するためである。このＬＰＣ分析・量子化部３０では、ＬＰＣ係数であるαパラメータをＬＳＰ（線スペクトル対）パラメータに変換して量子化したものを伝送するようにしている。
【００２１】
ＬＰＣ分析回路３２からのαパラメータは、α→ＬＳＰ変換回路３３に送られて、線スペクトル対（ＬＳＰ）パラメータに変換される。これは、直接型のフィルタ係数として求まったαパラメータを、例えば２０個、すなわち１０対のＬＳＰパラメータに変換する。変換は例えばニュートン−ラプソン法等を用いて行う。このＬＳＰパラメータに変換するのは、αパラメータよりも補間特性に優れているからである。
【００２２】
α→ＬＳＰ変換回路３３からのＬＳＰパラメータは、ＬＳＰ量子化器３４によりベクトル量子化あるいはマトリクス量子化される。このとき、フレーム間差分をとってからベクトル量子化、あるいは、複数フレーム分をまとめてマトリクス量子化してもよい。
【００２３】
このＬＳＰ量子化器３４からの量子化出力、すなわちＬＳＰベクトル量子化のインデクスは、端子３１を介して取り出され、また量子化済みのＬＳＰベクトルあるいは逆量子化出力は、ＬＳＰ補間回路３６及びＬＳＰ→α変換回路３８に送られる。
【００２４】
ＬＳＰ補間回路３６は、ＬＳＰ量子化器３４で上記フレーム毎にベクトル量子化されたＬＳＰのベクトルの前フレームと現フレームとの組を補間し、後の処理で必要となるレートにするためのものであり、この例では、８倍のレートに補間している。
【００２５】
このような補間が行われたＬＳＰベクトルを用いて入力音声の逆フィルタリングを実行するために、ＬＳＰ→α変換回路３７により、ＬＳＰパラメータを例えば２０次程度の直接型フィルタの係数であるαパラメータに変換する。このＬＳＰ→α変換回路３７からの出力は、上記ＬＰＣ残差を求めるためのＬＰＣ逆フィルタ回路１２に送られ、このＬＰＣ逆フィルタ１２では、８倍のレートで更新されるαパラメータにより逆フィルタリング処理を行って、滑らかな出力を得るようにしている。
【００２６】
また、ＬＳＰ量子化回路３４からの１倍レートのＬＳＰ係数は、ＬＳＰ→α変換回路３８に送られてαパラメータに変換され、上述したようなビット割当を行わせるためのビット割当算出回路（ビットアロケーション決定回路）４１に送られる。ビット割当算出回路４１では、割当ビットの他に、後述するＭＤＣＴ係数の量子化に使用する重みｗ(ω) の計算も行っている。
【００２７】
正規化（白色化）回路部１１のＬＰＣ逆フィルタ１２からの出力は、長期予測であるピッチ予測のためのピッチ逆フィルタ１３及びピッチ分析回路１５に送られる。
【００２８】
次に、長期予測について説明する。長期予測は、ピッチ分析により求められたピッチ周期あるいはピッチラグ分だけ時間軸上でずらした波形を元の波形から減算してピッチ予測残差を求めることにより行っており、この例では３点ピッチ予測によって行っている。なお、ピッチラグとは、サンプリングされた時間軸データのピッチ周期に対応するサンプル数のことである。
【００２９】
すなわち、ピッチ分析回路１５では１フレームに１回の割合、すなわち分析長が１フレームでピッチ分析が行われ、ピッチ分析結果の内のピッチラグはピッチ逆フィルタ１３及びビット割当算出回路４１に送られ、ピッチゲインはピッチゲイン量子化器１６に送られる。また、ピッチ分析回路１５からのピッチラグインデクスは端子５２から取り出されてデコーダ側に送られる。
【００３０】
ピッチゲイン量子化器１６では、上記３点予測に対応する３点でのピッチゲインがベクトル量子化され、コードブックインデクス（ピッチゲインインデクス）が出力端子５３より取り出され、代表値ベクトルあるいは逆量子化出力がピッチ逆フィルタ１３に送られる。ピッチ逆フィルタ１３は、上記ピッチ分析結果に基づいて３点ピッチ予測されたピッチ予測残差を出力する。このピッチ予測残差は、割り算回路１４及びエンベロープ抽出回路１７にそれぞれ送られている。
【００３１】
上記ピッチ分析についてさらに説明すると、このピッチ分析においては、上記ＬＰＣ残差を用いピッチパラメータを抽出する。ピッチパラメータは、ピッチラグ、ピッチゲインにより構成される。
【００３２】
まず、ピッチラグを決定する。上記ＬＰＣ残差の中央部を例えば５１２サンプル切り出し、ｘ(ｎ) （ｎ＝０〜５１１）とし、ｘと表記する。ｘからｋサンプル過去の５１２サンプルをｘ _k とすると、ピッチｋは、
‖ｘ−ｇｘ _k‖²
を最小にするものとして与えられる。すなわち、
ｇ＝（ｘ，ｘ _k）／‖ｘ _k‖²
として、
（ｘ，ｘ _k）²／‖ｘ _k‖²
を最大にするｋをサーチすることで、最適ラグＫを決定できる。本実施の形態では、Ｋは、１２≦Ｋ≦２４０である。このＫをそのまま使用するか、あるいは過去のフレームのピッチラグを用いたトラッキングの結果を用いてもよい。このようにして決定したＫについて、次に３点（Ｋ，Ｋ−１，Ｋ＋１）での最適ピッチゲインを求める。すなわち、
‖ｘ−（ｇ_-1 ｘ _L+1＋ｇ₀ ｘ _L＋ｇ₁ ｘ _L-1）‖²
を最小にするｇ_-1，ｇ₀，ｇ₁ を求め、最適ラグＫに対する３点ピッチゲインとする。この３点ピッチゲインはピッチゲイン量子化器１６に送られて、まとめてベクトル量子化され、また、量子化されたピッチゲイン及び最適ラグＫを用いてピッチ逆フィルタ１３を構成し、これによりピッチ残差を求める。求まったピッチ残差は既に求められている過去のピッチ残差と連結され、後述するようにＭＤＣＴ変換される。このとき、ＭＤＣＴ変換前に時間軸ゲインコントロールを行ってもよい。
【００３３】
ここで、図３は、入力信号に対する上記ＬＰＣ分析処理及びピッチ分析処理の関係を示しており、１フレームＦＲが例えば１０２４サンプルの分析長は、後述するＭＤＣＴ変換ブロックに対応した長さとなっている。時刻ｔ₁ が現在の新しいＬＰＣ分析中心（ＬＳＰ₁）を示し、時刻ｔ₀ が１フレーム前のＬＰＣ分析中心（ＬＳＰ₀）を示している。現在フレームの後半は新しいデータ（new data）ＮＤ、前半は前データ（previous data）ＰＤであり、図中のａはＬＳＰ₀ とＬＳＰ₁ の補間により得られるＬＰＣ残差を、ｂは１フレーム前のＬＰＣ残差を、ｃはこの部分（ｂの後半＋ａの前半）をターゲットとするピッチ分析より得られる新しいピッチ残差を、ｄは過去のピッチ残差をそれぞれ示している。この図３における新しいデータＮＤが全て入力された時点で、データａを求めることができ、このａと、既に求められているｂとを用いて新しいピッチ残差ｃを算出でき、これと既に求められているピッチ残差ｄとをつなぎ合わせることで、直交変換すべき１フレームのデータＦＲが作成できる。この１フレームＦＲのデータをＭＤＣＴ等の直交変換処理することができる。
【００３４】
次に、図４は、ＬＰＣ分析に基づくＬＰＣ逆フィルタ及びピッチ分析に基づくピッチ逆フィルタを介すことによる時間軸信号の変化を、また図５は、ＬＰＣ逆フィルタ及びピッチ逆フィルタを介すことによる信号の周波数軸上での変化をそれぞれ示している。すなわち、図４の（Ａ）は入力信号波形を、図５の（Ａ）はその周波数スペクトルを示し、これにＬＰＣ分析に基づくＬＰＣ逆フィルタを介すことにより、波形の特徴が抽出され除去されて、図４の（Ｂ）に示すようなほぼ周期的なパルス状の時間軸波形（ＬＰＣ残差波形）となる。このＬＰＣ残差波形に対応する周波数上のスペクトルは、図５の（Ｂ）のようになる。このＬＰＣ残差に対して上述したようなピッチ分析に基づくピッチ逆フィルタを介すことにより、ピッチ成分が抽出されて除去され、図４の（Ｃ）に示すような白色雑音に近い（ノイズライクな）時間軸信号になり、その周波数軸上のスペクトルは図５の（Ｃ）のようになる。
【００３５】
さらに、本発明の実施の形態においては、正規化（白色化）回路部１１において、フレーム内データのゲインの平滑化を行っている。これは、フレーム内の時間軸波形（本実施の形態ではピッチ逆フィルタ１３からの残差）から、エンベロープ抽出回路１７によりエンベロープを抽出し、抽出されたエンベロープを、スイッチ１９を介してエンベロープ量子化器２０に送り、量子化されたエンベロープの値により上記時間軸波形（ピッチ逆フィルタ１３からの残差）を割り算器１４で割り込むことにより、時間軸で平滑化された信号を得ている。この割り算器１４からの信号が、正規化（白色化）回路部１１の出力として、次段の直交変換回路部２５に送られる。
【００３６】
この平滑化により、量子化後の直交変換係数を時間信号に逆変換したときの量子化誤差の大きさをオリジナル信号のエンベロープに追従させる、いわゆるノイズシェイピングが実現できる。
【００３７】
上記エンベロープ抽出回路１７におけるエンベロープ抽出について説明する。このエンベロープ抽出回路１７に供給される信号、すなわち上記ＬＰＣ逆フィルタ１２及びピッチ逆フィルタ１３により正規化処理された残差信号を、ｘ(ｎ)，ｎ＝０〜Ｎ−１（Ｎは上記１フレームＦＲのサンプル数、直交変換窓長、例えばＮ＝１０２４）とするとき、この変換窓長Ｎより短い長さＭ、例えばＭ＝Ｎ／８の窓で切り出された各サブブロックあるいはサブフレーム毎のｒｍｓ（二乗平均の平方根）値をエンベロープとしている。すなわち、正規化された各サブブロック（サブフレーム）のｒｍｓとして、ｉ番目のサブブロック（ｉ＝０〜Ｍ−１）のｒｍｓ_i は、次の式（１）により定義される。
【００３８】
【数１】

【００３９】
上記式（１）により求められるｒｍｓ_i の各ｉについて、スカラ量子化を施し、あるいはｒｍｓ_i 全体を１つのベクトルとしてベクトル量子化を行うことができる。本実施の形態では、エンベロープ量子化器２０においてベクトル量子化を行っており、そのインデクスは時間軸ゲインコントロールのためのパラメータ、すなわちエンベロープインデクスとして端子２１より取り出され、デコーダ側に伝送される。
【００４０】
このようにして量子化された各サブブロック（サブフレーム）毎のｒｍｓ_i をｑｒｍｓ_i とし、この値により上記入力残差信号ｘ(ｎ)を割り算器１４にて割り込むことにより、時間軸で平滑化された信号ｘ_s(ｎ) を得る。ただし、このようにして求めたｒｍｓ_i の内、フレーム内で最大のものと最小のものとの比が、ある一定の値（例えば４）以上のとき、上述したゲインコントロールを行い、パラメータ（上記エンベロープインデクス）の量子化のために所定のビット数（例えば７ビット）を割り当てているが、フレーム内の各サブブロック（サブフレーム）毎のｒｍｓ_i の比が上記一定の値よりも小さいときにはゲインコントロールを行わない通常の処理を行い、ゲインコントロールのためのビットは、他のパラメータ、例えば周波数軸パラメータ（直交変換係数データ）の量子化に割り当てられる。このゲインコントロールを行うか否かの判別は、ゲインコントロールオン／オフ決定回路１８により行われ、その判別出力（ゲインコントロールＳＷ）は、エンベロープ量子化器２０の入力側のスイッチ１９のスイッチング制御信号として送られるとともに、後述する係数量子化部４０内の係数量子化回路４５に送られて、ゲインコントロールがオンのときとオフのときの係数の割当ビット数の切り換えに使用される。また、このゲインコントロールオン／オフ判別出力（ゲインコントロールＳＷ）は、端子２２を介して取り出され、デコーダ側に送られる。
【００４１】
割り算器４１でゲインコントロール（あるいはゲイン圧縮）されて時間軸で平滑化された信号ｘ_s(ｎ) は、正規化回路部１１の出力として、直交変換回路部２５に送られ、例えばＭＤＣＴにより周波数軸パラメータ（係数データ）に変換される。この直交変換回路部２５は、窓掛け回路２６とＭＤＣＴ回路２７とから成る。窓掛け回路２６では、１／２フレームオーバーラップによるＭＤＣＴのエリアシングキャンセレーションが利用できるような窓関数による窓掛けが施される。
【００４２】
デコーダ側での復号の際には、伝送された周波数軸パラメータ（例えばＭＤＣＴ係数）の量子化インデクスから逆量子化を行い、その後、周波数軸／時間軸変換である逆直交変換により時間軸信号に戻され、その後、逆量子化された上記時間軸ゲインコントロールパラメータを用いて、オーバーラップ加算、及びエンコード時のゲイン平滑化の逆の処理（ゲイン伸長、あるいはゲイン復元処理）を行うわけであるが、ゲイン平滑化を用いた場合には、通常の対称かつ重畳位置の窓の値の二乗和が一定値になるような窓を仮定したオーバーラップ加算は使用できないため、次のよう処理が必要とされる。
【００４３】
すなわち、図６は、デコーダ側でのオーバーラップ加算及びゲインコントロールの様子を示す図であり、この図６において、ｗ(ｎ)，ｎ＝０〜Ｎ−１、は分析・合成窓を示し、ｇ(ｎ)は時間軸ゲインコントロールパラメータ、すなわち、
ｇ(ｎ) ＝ｑｒｍｓ_j （ｊは、ｊＭ≦ｎ≦（j+1）Ｍを満足）
であり、ｇ₁(ｎ) は現フレームＦＲ₁ のｇ(ｎ)、ｇ₀(ｎ) は１フレーム過去（前フレームＦＲ₀）のｇ(ｎ)とする。また、この図６では、１フレームを８分割してサブブロック（サブフレーム）ＳＢとしている（Ｍ＝８）。
【００４４】
前フレームＦＲ₀ の後半のデータに対し、エンコーダ側ではゲインコントロールのためのｇ₀(n+(N/2))による除算後、ＭＤＣＴのための分析窓ｗ((N/2)-1-n) がかかっているため、デコーダ側で逆ＭＤＣＴ後、再び分析窓ｗ((N/2)-1-n) をかけて得られる信号、すなわち、主成分とエリアシング（aliasing）成分との和Ｐ(ｎ)は、次の式（２）のようになる。
【００４５】
【数２】

【００４６】
また、現フレームＦＲ₁ の前半のデータに対し、エンコーダ側では、ゲインコントロールのためのｇ₀(ｎ)による除算後、ＭＤＣＴのための分析窓ｗ(ｎ)がかかっているため、デコーダ側で逆ＭＤＣＴ後、再び分析窓ｗ(ｎ)をかけて得られる信号、すなわち、主成分とエリアシング成分との和Ｑ(ｎ)は、次の式（３）のようになる。
【００４７】
【数３】

【００４８】
従って、再生すべきｘ(ｎ)は、次の式（４）として求められる。
【００４９】
【数４】

【００５０】
このような窓掛けを行い、上記サブブロック（サブフレーム）毎のｒｍｓをエンベロープとしてゲインコントロールを行うことにより、時間変化の激しい音、例えば鋭いアタックを有する楽音や、ピッチピークの間で比較的早い減衰をするような音声に対して、プリエコーのような耳につきやすい量子化雑音を低減することができる。
【００５１】
次に、直交変換回路部２５のＭＤＣＴ回路２７でＭＤＣＴ処理されて得られたＭＤＣＴ係数データは、係数量子化部４０のフレームゲイン正規化回路４３及びフレームゲイン算出・量子化回路４７に送られる。本実施の形態の係数量子化部４０では、先ず上記ＭＤＣＴ変換ブロックである１フレームの係数全体のフレームゲイン（ブロックゲイン）を算出してゲイン正規化を行った後、さらに聴覚に合わせて高域ほどバンド幅を広くしたサブバンドである臨界帯域（クリティカルバンド）に分割して、それぞれのバンド毎のスケールファクタ、いわゆるバークスケールファクタを算出し、これによって再び正規化を行っている。上記バークスケールファクタとしては、各帯域毎にその帯域内の係数のピーク値や、あるいは二乗平均の平方根（ｒｍｓ）等を用いることができ、各バンドのバークスケールファクタはまとめてベクトル量子化される。
【００５２】
すなわち、係数量子化部４０のフレームゲイン算出・量子化回路４７では、上記ＭＤＣＴ変換ブロックであるフレーム毎のゲインが算出されて量子化され、そのコードブックインデクス（フレームゲインインデクス）が端子５５を介して取り出されてデコーダ側に送られると共に、量子化された値のフレームゲインがフレームゲイン正規化回路４３に送られて、入力をフレームゲインで割ることによる正規化が行われる。このフレームゲインで正規化された出力は、バークスケールファクタ算出・量子化回路４２及びバークスケールファクタ正規化回路４４に送られる。
【００５３】
バークスケールファクタ算出・量子化回路４２では、上記各臨界帯域毎のバークスケールファクタが算出されて量子化され、コードブックインデクス（バークスケールファクタインデクス）が端子５４を介して取り出されてデコーダ側に送られると共に、量子化された値のバークスケールファクタがビット割当算出回路４１及びバークスケールファクタ正規化回路４４に送られる。バークスケールファクタ正規化回路４４では、上記臨界帯域毎に帯域内の係数の正規化が行われ、バークスケールファクタで正規化された係数が係数量子化回路４５に送られる。
【００５４】
係数量子化回路４５では、ビット割当算出回路４１からのビット割当情報に従って各係数に量子化ビット数が割り当てられて量子化が行われ、このとき、上述したゲインコントロールオン／オフ決定回路１８からのゲインコントロールＳＷ情報に応じて全体の割当ビット数の切換が行われる。これは、例えばベクトル量子化を行う場合には、上記ゲインコントロールオン時用と、オフ時用との２組のコードブックを用意しておき、上記ゲインコントロールＳＷ情報に応じてこれらのコードブックを切り換えるようにすればよい。
【００５５】
ここで、ビット割当算出回路４１におけるビット割当（ビットアロケーション）について説明すると、上述のようにして求められたＬＰＣ係数、ピッチパラメータ、バークスケールファクタ等により、各ＭＤＣＴ係数に対する量子化時の重みを算出し、これにより全帯域のＭＤＣＴ係数のビット割当を決定して量子化を行う。この重みは、ノイズシェイピングファクタと考えることができ、また、各パラメータを変更することで所望のノイズシェイピング特性を持たせることが可能である。一例として、本実施の形態においては、次の式に示すように、ＬＰＣ係数、ピッチパラメータ、及びバークスケールファクタのみを用いて、重みＷ(ω)を算出している。
【００５６】
【数５】

【００５７】
このように量子化時の重み決定は、ＬＰＣ、ピッチ、バークスケールファクタのみによってなされるため、この３種類のパラメータのみをデコーダに伝送すれば、エンコーダと全く同じビットアロケーションが再現され、アロケーションの一情報等は一切送る必要はなくなり、サイドインフォメーション（補助情報）のレートを下げることができる。
【００５８】
次に、係数量子化回路４５での量子化の具体例について、図７〜図９を参照しながら説明する。
【００５９】
図７は、図２の係数量子化回路４５の具体的な構成の一例を示すものであり、入力端子１には、図２のバークスケールファクタ正規化回路４４からの正規化された係数データ（例えばＭＤＣＴ係数）ｙが供給されている。重み計算回路２は、図２のビット割当算出回路４１にほぼ相当するが、量子化ビットを割り当てるための各係数の重みを計算する部分のみを取り出したものである。この重み計算回路２では、上述したＬＰＣ係数、ピッチパラメータ、バークスケールファクタ等のパラメータに基づいて重みｗが計算される。ここで、１フレーム分の係数をベクトルｙ、１フレーム分の重みをベクトルｗで表すものとする。
【００６０】
これらの係数ベクトルｙ、重みベクトルｗを、必要に応じてバンド分割回路３に送ることにより、Ｌ個（Ｌ≧１）のバンドに分割する。バンド数としては、例えば低域、中域、高域の３バンド程度（Ｌ＝３）が挙げられるが、これに限定されず、またバンド分割しなくてもよい。この各バンド毎の係数、例えば第ｋ番目のバンドの係数をｙ _k、重みをｗ _k （０≦ｋ≦Ｌ−１）とするとき、
ｙ＝（ｙ ₀,ｙ ₁,...,ｙ _L-1）
ｗ＝（ｗ ₀,ｗ ₁,...,ｗ _L-1）
となる。このバンド分割のバンド数や各バンド毎の係数の個数は、予め設定された数値に固定されている。
【００６１】
次に、各バンドの係数ベクトルｙ ₀,ｙ ₁,...,ｙ _L-1 をそれぞれソート回路４₀,４₁,...,４_L-1 に送って、各バンド毎に、それぞれのバンド内の係数に対して、重みの順に従って順位をつける。これは、各バンド内の係数自体を、重みの順に従って並び替え（ソート）すればよいが、各係数の周波数軸上での位置あるいは順番を表す指標（インデクス）のみを重みの順にソートして、ソートされた指標（インデクス）に対応して各係数の量子化時の精度（割当ビット数等）を決定するようにしてもよい。係数自体をソートする場合には、任意の第ｋ番目のバンドについて、係数ベクトルｙ _k の各係数を重みの順にソートし、重み順にソートされた係数ベクトルｙ'_kを得る。
【００６２】
図８は、このソートの様子を示したものであり、図８の（Ａ）は第ｋバンドの重みベクトルｗ _k を、図８の（Ｂ）は第ｋバンドの係数ベクトルｙ _k をそれぞれ示している。この図８の例においては、第ｋ番目のバンド内の要素数を例えば８としており、重みベクトルｗ _k の各要素となる８個の重みをｗ₁,ｗ₂,...,ｗ₈ 、係数ベクトルｙ _k の各要素となる８個の係数をｙ₁,ｙ₂,...,ｙ₈ にてそれぞれ表している。図８の（Ａ）、（Ｂ）の例においては、係数ｙ₃ に対応する重みｗ₃ が最も大きく、以下重みの順に、ｗ₂,ｗ₆,...,ｗ₄ となっている。図８の（Ｃ）は、この重みの順に係数ｙ₁,ｙ₂,...,ｙ₈ を並べ替え（ソート）して、順にｙ₃,ｙ₂,ｙ₆,...,ｙ₄ とした係数ベクトルｙ'_kを示している。
【００６３】
次に、上述のように各バンド毎に重みの順に従ってソートされた各バンドの係数ベクトルｙ'₀,ｙ'₁,...,ｙ'_L-1 をそれぞれベクトル量子化器５₀,５₁,...,５_L-1 に送って、それぞれベクトル量子化を行う。ここで、各バンド毎の割当ビット数を予め固定しておき、バンド毎のエネルギが変化しても各バンドへの量子化ビット数の割当が変動することを防止することが好ましい。
【００６４】
このバンド毎のベクトル量子化について、１つのバンド内の要素数が多い場合には、いくつかのサブベクトルに区切って、各サブベクトル毎にベクトル量子化すればよい。すなわち、任意の第ｋバンドのソート後の係数ベクトルｙ'_kを、図９に示すように、予め定めた要素数に従っていくつかのサブベクトルに区切り、例えば３つのサブベクトルｙ'_k1,ｙ'_k2,ｙ'_k3 とし、これらをそれぞれベクトル量子化して、コードブックインデクスｃ_k1,ｃ_k2,ｃ_k3を得るようにすればよい。この第ｋバンドのインデクスｃ_k1,ｃ_k2,ｃ_k3をまとめて係数インデクスのベクトルｃ _k とする。ここで、サブベクトルの量子化においては、先頭側のベクトルほど量子化ビット数を多く割り当てることで、重みに従った量子化となる。これは、例えば図９において、ベクトルｙ'_k1 を８ビット、ベクトルｙ'_k2 を６ビット、ベクトルｙ'_k3 を８ビット、のように割り当てることにより、係数１個当たりの割り当てビット数が多いものから順に少なくなり、重みに従ったビット割当が実現できることになる。
【００６５】
次に、図７の各ベクトル量子化器５₀,５₁,...,５_L-1 からの各バンド毎の係数インデクスのベクトルｃ ₀,ｃ ₁,...,ｃ _L-1 をまとめて、全バンドの係数インデクスのベクトルｃとし、端子６から取り出している。この端子６は、図２の端子５１に相当する。
【００６６】
なお、上記図７〜図９の具体例では、直交変換された周波数軸上の係数（例えばＭＤＣＴ係数）自体を、上記重みに従ってソートし、ソートされた係数の順序に従って割当ビット数を多いものから少なくするようにして（ソート後の順位の上位側の係数ほど多くのビットを割り当てて）いるが、直交変換されて得られた各係数の周波数軸上での位置あるいは順番を表す指標（インデクス）のみを上記重みの順にソートして、ソートされた指標（インデクス）に対応して各係数の量子化時の精度（割当ビット数等）を決定するようにしてもよい。また、上述した具体例では、係数の量子化としてベクトル量子化を用いているが、スカラ量子化、あるいはスカラ量子化とベクトル量子化とを併用するような量子化に本発明を適用することも容易である。
【００６７】
次に、上述した図２に示すようなオーディオ信号符号化装置（エンコーダ側）に対応するオーディオ信号復号装置（デコーダ側）の構成の一例について、図１０を参照しながら説明する。
【００６８】
この図１０において、各入力端子６０〜６７には上記図２の各出力端子からのデータが供給されており、図１０の入力端子６０には、上記図２の出力端子５１からの直交変換係数（例えばＭＤＣＴ係数）のインデクスが供給されている。入力端子６１には、図２の出力端子３１からのＬＳＰインデクスが供給され、入力端子６２〜６５には、図２の各出力端子５２〜５５からのデータ、すなわち、ピッチラグインデクス、ピッチゲインインデクス、バークスケールファクタインデクス、フレームゲインインデクスがそれぞれ供給され、入力端子６６、６７には、図２の各出力端子２１、２２からのエンベロープインデクス、ゲインコントロールＳＷがそれぞれ供給されている。
【００６９】
入力端子６０からの係数インデクスは、係数逆量子化回路７１で逆量子化され、掛け算器７３を介して、例えばＩＭＤＣＴ（逆ＭＤＣＴ）等の逆直交変換回路７４に送られる。
【００７０】
入力端子６１からのＬＳＰインデクスは、ＬＰＣパラメータ再生部８０の逆量子化器８１に送られてＬＳＰデータに逆量子化され、ＬＳＰ→α変換回路８２及びＬＳＰ補間回路８３に送られる。ＬＳＰ→α変換回路８２からのαパラメータ（ＬＰＣ係数）は、ビット割当回路７２に送られる。ＬＳＰ補間回路８３からのＬＳＰデータは、ＬＳＰ→α変換回路８４でαパラメータ（ＬＰＣ係数）に変換され、後述するＬＰＣ合成回路７７に送られる。
【００７１】
ビット割当回路７２には、ＬＳＰ→α変換回路８２からの上記ＬＰＣ係数の他に、入力端子６２からのピッチラグと、入力端子６３から逆量子化器９１を介して得られたピッチゲインと、入力端子６４から逆量子化器９２を介して得られたバークスケールファクタとが供給されており、これらのパラメータのみに基づいて、エンコーダ側と同一のビット割当を再現することができる。ビット割当回路７２からのビット割当情報は、係数逆量子化器７１に送られて、各係数の量子化割当ビットの決定に使用される。
【００７２】
入力端子６５からのフレームゲインインデクスは、フレームゲイン逆量子化器８６に送られて逆量子化され、得られたフレームゲインが掛け算器７３に送られる。
【００７３】
入力端子６６からのエンベロープインデクスは、スイッチ８７を介してエンベロープ逆量子化器８８に送られて逆量子化され、得られたエンベロープデータがオーバーラップ加算回路７５に送られる。また、入力端子６７からのゲインコントロールＳＷ情報は、上記係数逆量子化器７１及びオーバーラップ加算回路７５に送られると共に、スイッチ８７の制御信号として用いられる。記係数逆量子化器７１は、上述したようなゲインコントロールのオン／オフに応じて、全体の割当ビット数を切り換えており、逆ベクトル量子化の場合には、ゲインコントロールのオン時のコードブックとオフ時のコードブックとを切り換えるようにしてもよい。
【００７４】
オーバラップ加算回路７５は、ＩＭＤＣＴ等の逆直交変換回路７４からの上記フレーム毎に時間軸に戻された信号を、１／２フレームずつオーバーラップさせながら加算するものであり、ゲインコントロールのオン時には、上記エンベロープ逆量子化器８８からのエンベロープデータによるゲインコントロール（上述したゲイン伸長あるいはゲイン復元）処理しながらオーバーラップ加算する。
【００７５】
オーバラップ加算回路７５からの時間軸信号は、ピッチ合成回路７６に送られて、ピッチ成分が復元される。これは、図２のピッチ逆フィルタ１３での処理の逆処理に相当するものであり、端子６２からのピッチラグ及び逆量子化器９１からのピッチゲインが用いられる。
【００７６】
ピッチ合成回路７６からの出力は、ＬＰＣ合成回路７７に送られて、図２のＬＰＣ逆フィルタ１２での処理の逆の処理に対応するＬＰＣ合成処理が施され、出力端子７８より取り出される。
【００７７】
ここで、上記エンコーダ側の係数量子化部４０の係数量子化回路４５として、上記図７に示すような重みに従って各バンド毎にソートされた係数をベクトル量子化するものを用いる場合には、係数逆量子化回路７１として、図１１に示すような構成を用いることができる。
【００７８】
この図１１において、入力端子６０は上記図１０の入力端子６０に相当し、上記係数インデクス（ＭＤＣＴ係数等の直交変換係数が量子化されることで得られたコードブックインデクス）が供給され、重み計算回路７９には、図１０のＬＳＰ→α変換回路８２からのαパラメータ（ＬＰＣ係数）、入力端子６２からのピッチラグ、逆量子化器９１からのピッチゲイン、逆量子化器９２からのバークスケールファクタ等が供給されている。重み計算回路７９は、図１０のビット割当回路７２中の、量子化ビット割当の計算途中に求められる各係数の重みを算出するまでの構成部分を取り出したものである。この重み計算回路７９では、上述したように、上記式（５）の計算により、上記ＬＰＣ係数、ピッチパラメータ（ピッチラグ及びピッチゲイン）、及びバークスケールファクタのみを用いて、重みＷ(ω)を計算している。入力端子９３には、周波数軸上の係数の位置あるいは順番を示す指標（インデクス）、すなわち全帯域でＮ個の係数データがある場合には、０〜Ｎ−１の数値（これをベクトルＩとする）が供給されている。なお、重み計算回路７９からの上記Ｎ個の各係数に対するＮ個の重みをベクトルｗで表す。
【００７９】
重み計算回路７９からの重みｗ及び入力端子９３からの指標Ｉは、バンド分割回路９４に送られて、エンコーダ側と同様にＬ個のバンドに分割される。エンコーダ側で例えば低域、中域、高域の３バンド（Ｌ＝３）に分割されていれば、デコーダ側でも同じく３バンドに分割する。これらのバンド分割された各バンド毎の指標及び重みは、それぞれソート回路９５₀,９５₁,...,９５_L-1 に送られる。例えば第ｋ番目のバンド内の指標Ｉ _k 及び重みｗ _k は、第ｋ番目のソート回路９５_k に送られる。ソート回路９５_k では、第ｋ番目のバンド内の指標Ｉ _k が、各係数の重みｗ _k の順序に従って並べ替え（ソート）され、ソートされた指標Ｉ'_k が出力される。各ソート回路９５₀,９５₁,...,９５_L-1 からのそれぞれのバンド毎にソートされた指標Ｉ ₀,Ｉ ₁,...,Ｉ _L-1 は、係数再構成回路９７に送られる。
【００８０】
また、図１０の入力端子６０からの直交変換係数のインデクスは、エンコーダ側で量子化される際に、上記図７〜図９と共に説明したように、Ｌバンドにバンド分割され、各バンド毎に重み順にソートされた係数が、１つのバンド内で予め定められた規則に基づく個数毎に区切られたサブベクトル毎にベクトル量子化されて得られたものである。具体的には、Ｌ個のバンドについて、それぞれのバンド毎の係数インデクスの集合をそれぞれベクトルｃ ₀,ｃ ₁,...,ｃ _L-1 としたものであり、これらの各バンドの係数インデクスのベクトルｃ ₀,ｃ ₁,...,ｃ _L-1 が、それぞれ逆量子化器９６₀,９５₁,...,９５_L-1 に送られている。これらの逆量子化器９６₀,９５₁,...,９５_L-1 で逆量子化されて得られた係数データは、各バンド内で上記重みの順にソートされているもの、すなわち上記図７の各ソート回路４₀,４₁,...,４_L-1 からの係数ベクトルｙ'₀,ｙ'₁,...,ｙ'_L-1 に相当するものであり、配列順序は周波数軸上の位置とは異なっている。そこで、係数の時間軸上での位置を表す指標Ｉを上記重みに従って先にソートしておき、このソートされた指標と、上記逆量子化されて得られた係数データとを対応させて、元の時間軸上の順序に戻すのが係数再構成回路９７の機能である。すなわち、係数再構成回路９７では、各逆量子化器９６₀,９５₁,...,９５_L-1 からの、各バンド内で重み順にソートされた係数データに対して、各ソート回路９５₀,９５₁,...,９５_L-1 からのそれぞれのバンド毎にソートされた指標を対応させ、このソートされた指標に従って逆量子化された係数データを並べ替える（逆ソートする）ことにより、元の時間軸上の順序に並んだ係数データｙを得て、出力端子９８より取り出している。この出力端子９８からの係数データは、図１０の掛け算器７３に送られる。
【００８１】
なお、本発明は上記実施の形態のみに限定されるものではなく、例えば、入力時間軸信号は音声や音楽を含むオーディオ信号以外に、電話帯域の音声信号や、ビデオ信号等でもよい。また、正規化回路部１１の構成や、ＬＰＣ分析及びピッチ分析は、これらに限定されず、線形予測あるいは非線形予測等により時間軸入力波形の特徴あるいは相関性を抽出して除去する種々の構成がとり得る。また、各量子化器には、ベクトル量子化以外にも、スカラ量子化や、スカラ量子化とベクトル量子化とを併用するようにしてもよい。
【００８２】
【発明の効果】
以上の説明から明らかなように、本発明によれば、時間軸上の入力信号に対して直交変換を用いて符号化を行う信号符号化において、直交変換に先立って、時間軸上で線形予測符号化（ＬＰＣ）分析及びピッチ分析を行うことにより得られたに基づいて信号波形の相関性あるいは特徴部分を除去しているため、ほぼ白色雑音に近いノイズライクな残差信号を直交変換することになり、符号化効率を高めることができる。
【００８３】
また、時間軸上の入力信号に対して直交変換を用いて符号化を行う際に、上記直交変換されて得られた係数の量子化の際のビット割当を、上記入力信号の線形予測符号化（ＬＰＣ）分析の結果及びピッチ分析の結果に基づいて決定することにより、ビット割当のためだけの情報を送らなくとも、ＬＰＣ分析結果及びピッチ分析結果の各パラメータを送るだけでデコーダ側でエンコーダ側と同じビット割当を再現でき、付加情報（サイドインフォメーション）のレートを抑えて、全体的なビットレートを低減することができ、符号化効率の向上に貢献する。
【００８４】
また、上記直交変換として、改良離散コサイン変換（ＭＤＣＴ）を用いることにより、良好な音質でのオーディオ信号の高能率符号化が行える。
【図面の簡単な説明】
【図１】本発明の実施の形態の概略構成を示すブロック図である。
【図２】本発明の実施の形態のより具体的な構成例であるオーディオ信号符号化装置を示すブロック図である。
【図３】入力信号に対するＬＰＣ分析処理及びピッチ分析処理の関係を示す図である。
【図４】時間軸入力信号のＬＰＣ分析及びピッチ分析による相関性の除去を説明するための時間軸信号波形図である。
【図５】時間軸入力信号のＬＰＣ分析及びピッチ分析による相関性の除去を説明するための周波数特性を示す図である。
【図６】デコーダ側でのオーバーラップ加算を説明するための時間軸信号波形図である。
【図７】係数量子化回路の具体的構成の一例を示すブロック図である。
【図８】バンド分割された１つのバンド内の係数の重みに応じたソートを説明するための図である。
【図９】バンド分割された１つのバンド内で重みに応じてソートされた係数をサブベクトルに区切ってベクトル量子化する処理を説明するための図である。
【図１０】図２のオーディオ信号符号化装置に対応する復号側構成としてのオーディオ信号復号装置の一例を示すブロック図である。
【図１１】図１０のオーディオ信号復号装置の逆量子化回路の一具体例を示すブロック図である。
【符号の説明】
１１正規化回路部、１２ＬＰＣ逆フィルタ、１３ピッチ逆フィルタ、１５ピッチ分析回路、１６ピッチゲイン量子化回路、１７エンベロープ抽出回路、１８ゲインコントロールオン／オフ決定回路、２０エンベロープ量子化回路、２５直交変換回路部、２６窓掛け回路、２７ＭＤＣＴ回路、３０ＬＰＣ分析・量子化部、３２ＬＰＣ分析回路、３３ α→ＬＳＰ変換回路、３４ＬＳＰ量子化回路、３６ＬＳＰ補間回路、３７，３８ＬＳＰ→α変換回路、４０係数量子化回路部、４１ビット割当算出回路、４２バークスケールファクタ算出・量子化回路、４３フレームゲイン正規化回路、４４バークスケールファクタ正規化回路、４５係数量子化回路、４７フレームゲイン算出・量子化回路[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a signal encoding apparatus and method for performing quantization by converting a time axis / frequency axis of an input signal, and more particularly to a signal encoding apparatus and method suitable for high-efficiency encoding of an audio signal.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, various encoding methods are known in which signal compression is performed using statistical properties in the time domain and frequency domain of audio signals (including audio signals and music signals) and human auditory characteristics. . This coding method is roughly classified into time domain coding, frequency domain coding, analysis / synthesis coding, and the like.
[0003]
[Problems to be solved by the invention]
By the way, in transform coding in which encoding is performed by orthogonally transforming a time axis input signal into a frequency axis signal, it is possible to improve the coding efficiency by removing the characteristics of the time axis waveform of the input signal. Desirable to increase.
[0004]
In addition, when quantizing coefficient data on the frequency axis that has been orthogonally transformed, bit allocation is often performed by weighting, but information for this bit allocation is used as additional information or side information. Transmission is not preferable because the bit rate increases.
[0005]
The present invention has been made in view of such circumstances, and can eliminate the characteristics or correlation of the time-axis waveform signal prior to the orthogonal transformation, can increase the encoding efficiency, and at the time of quantization It is an object of the present invention to provide a signal encoding apparatus and method that can reproduce bit allocation on the decoder side and contribute to bit rate reduction without directly transmitting bit allocation information.
[0006]
[Means for Solving the Problems]
In order to solve the above-described problems, the present invention extracts residuals based on information obtained by performing linear predictive coding (LPC) analysis and pitch analysis on an input signal on a time axis, Turned intoOn the time axisNormalize by performing gain control for smoothing the gain by the envelope value.Smooth on time axisThe orthogonal output is subjected to orthogonal transformation, and the orthogonally transformed output is quantized.
[0007]
Here, it is preferable that the orthogonal transform transforms a time axis signal input by an improved discrete cosine transform (MDCT) into coefficient data on the frequency axis. Further, the normalization outputs an LPC prediction residual of the input signal based on an LPC coefficient obtained by LPC analysis of the input signal, and converts the LPC prediction residual to a pitch parameter obtained by pitch analysis. It is preferable to remove the pitch correlation of the LPC prediction residual based on the above. Furthermore, it is preferable that the quantization means performs quantization according to the number of allocated bits determined based on the LPC analysis result and the pitch analysis result.
[0008]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments according to the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing a schematic configuration of a signal encoding apparatus according to an embodiment of the present invention.
[0009]
In FIG. 1, a waveform signal on the time axis, for example, a digital audio signal is input to an input terminal 10. Specifically, for example, a so-called wideband audio signal having a sampling frequency Fs of 16 kHz and about 0 to 8 kHz can be mentioned, but the present invention is not limited to this.
[0010]
An input signal from the input terminal 10 is sent to the normalization circuit unit 11. The normalization circuit unit 11 is also called a whitening circuit, and performs whitening so as to extract features of the input time waveform signal and extract a prediction residual. Whitening of the time waveform can be performed by linear or non-linear prediction. For example, the input time waveform signal can be whitened by LPC (Linear Predictive Coding) and pitch analysis.
[0011]
In the example of FIG. 1, the normalization (whitening) circuit unit 11 includes an LPC inverse filter 12 and a pitch inverse filter 13, and sends an input signal from the input terminal 10 to an LPC analysis circuit 39 to perform LPC analysis. The LPC coefficient (so-called α parameter) obtained as a result of the analysis is sent to the LPC inverse filter 12 so as to extract the LPC prediction residual. The LPC prediction residual from the LPC inverse filter 12 is sent to the pitch analysis circuit 15 and the pitch inverse filter 13, and the pitch analysis circuit 15 extracts pitch parameters (pitch gain and pitch lag) by pitch analysis as will be described later. It is sent to the inverse filter 13. The pitch inverse filter 13 removes the pitch correlation from the LPC prediction residual to obtain the pitch residual, and sends it to the orthogonal transform circuit 25. The LPC coefficient from the LPC analysis circuit 39 and the pitch parameter of the pitch analysis circuit 15 are sent to the bit allocation calculation circuit 41 for determining the bit allocation (bit allocation) at the time of quantization.
[0012]
The whitened time waveform signal from the normalization circuit unit 11, that is, the pitch residual of the LPC residual, is sent to the orthogonal transform circuit unit 25 that performs time axis / frequency axis conversion (T / F mapping) processing. Converted to frequency axis signal (coefficient data). As this T / F conversion, for example, DCT (Discrete Cosine Transform), MDCT (Modified DCT: Modified Discrete Cosine Transform), FFT (Fast Fourier Transform), etc. are often used. Parameters or coefficient data such as MDCT coefficients and FFT coefficients obtained from the orthogonal transform circuit section 25 are sent to the coefficient quantization section 40 and subjected to SQ (scalar quantization) or VQ (vector quantization). . In order to perform this coefficient quantization efficiently, it is necessary to determine the bit allocation (bit allocation) for each coefficient. This bit allocation is based on an auditory masking model, various parameters such as LPC coefficients and pitch parameters obtained at the time of whitening in the normalization circuit 11, or a Bark scale factor calculated from the coefficient data. Can be calculated. In addition, as the Bark scale factor, the coefficient obtained by orthogonal transformation is divided into a frequency band where the bandwidth becomes wider as the higher frequency is matched to the human auditory characteristics, the so-called critical band. The peak value or the rms (root mean square) value for each critical band is used.
[0013]
In this embodiment, it is defined that the bit allocation is calculated only by the LPC coefficient, the pitch parameter, and the Bark scale factor, and by sending only these parameters, the same bit allocation as the encoder side is transmitted on the decoder side. Can be reproduced, and there is no need to send additional information (side information) indicating the number of allocated bits, which can contribute to a reduction in bit rate.
[0014]
The LPC coefficient (α parameter) used in the LPC inverse filter 12 of the normalization circuit unit 11 and the pitch parameter (pitch gain thereof) used in the pitch inverse filter 13 are described later in consideration of reproducibility on the decoder side. The quantized value is used.
[0015]
Although the signal encoding device of FIG. 1 is shown as a hardware configuration, it can also be realized by software using a so-called DSP (digital signal processor) or the like.
[0016]
Next, an audio signal encoding apparatus as a more specific configuration example of the above-described embodiment of the present invention will be described with reference to FIG.
[0017]
In the audio signal encoding apparatus shown in FIG. 2, the supplied time-axis signal is subjected to time-axis / frequency-axis conversion (T / F conversion) by, for example, MDCT (improved discrete cosine transform) in the orthogonal transform unit 25 to obtain a frequency. On-axis data (MDCT coefficients) is encoded by being quantized by the coefficient quantization unit 40. In this embodiment, the time-axis signal before orthogonal transformation is applied to the time-axis signal. The features of the input signal waveform are extracted by LPC analysis, pitch analysis, envelope extraction, etc., and the parameters representing these features are separately quantized and extracted, and the normalization (whitening) circuit unit 11 removes these features. Or, by removing the correlation of the signal, a so-called noise-like signal close to white noise is obtained, thereby increasing the coding efficiency.
[0018]
In addition, the LPC coefficient obtained by the LPC analysis and the pitch parameter obtained by the pitch analysis are used to determine the bit allocation (bit allocation) when quantizing the coefficient data after the orthogonal transformation. In addition, a Bark scale factor may be used as a normalization factor by extracting the peak value or rms value for each critical band (critical band) on the frequency axis. Based on these LPC coefficients, pitch parameters, and Bark scale factors, weights at the time of quantization for orthogonal transform coefficient data such as MDCT coefficients are calculated, thereby determining the bit allocation of the entire band and performing coefficient quantization. When the weight determination at the time of quantization is made by a predetermined parameter, for example, when it is made only by the LPC coefficient, the pitch parameter, and the Bark scale factor, only these parameters are transmitted to the decoder side, Since exactly the same bit allocation (bit allocation) as that on the encoder side is reproduced, there is no need to send additional information (side information) regarding the bit allocation itself.
[0019]
Further, when coefficient quantization is performed, the coefficient data is rearranged (sorted) in the order according to the weights or the number of allocated bits at the time of quantization, and quantization is performed with high accuracy in order. In this quantization, it is preferable to divide the sorted coefficients into subvectors in order from the top and perform vector quantization respectively. The sorting may be performed on coefficient data of all bands, but may be divided into several bands and sorted within each band range. In this case as well, if the parameters used for bit allocation are defined in advance, the bit allocation, the order of sorting, etc. can be done without sending the bit allocation information or the position information of the sorted coefficients directly by sending the parameters. Can be reproduced on the decoder side.
[0020]
In FIG. 2, the input terminal 10 is supplied with a digital audio signal obtained by A / D converting a so-called broadband audio signal of about 0 to 8 kHz at a sampling frequency Fs = 16 kHz. This input signal is sent to the LPC inverse filter 12 of the normalization (whitening) circuit unit 11, and is extracted, for example, by 1024 samples and sent to the LPC analysis / quantization unit 30. The LPC analysis / quantization unit 30 calculates a 20th-order LPC coefficient, that is, an α parameter after performing a Hamming window, and obtains an LPC residual by the LPC inverse filter 11. In this LPC analysis, a part of one frame of 1024 samples as a unit of analysis, for example, half of 512 samples is overlapped with the next block, and the frame interval is 512 samples. Yes. This is because the MDCT (improved discrete cosine transform) aliasing cancellation adopted as the subsequent orthogonal transform is used. The LPC analysis / quantization unit 30 transmits an LPC coefficient that is quantized by converting an α parameter into an LSP (line spectrum pair) parameter.
[0021]
The α parameter from the LPC analysis circuit 32 is sent to the α → LSP conversion circuit 33 and converted into a line spectrum pair (LSP) parameter. This converts the α parameter obtained as a direct filter coefficient into, for example, 20 pairs, that is, 10 pairs of LSP parameters. The conversion is performed using, for example, the Newton-Raphson method. The reason for converting to the LSP parameter is that the interpolation characteristic is superior to the α parameter.
[0022]
The LSP parameters from the α → LSP conversion circuit 33 are vector quantized or matrix quantized by the LSP quantizer 34. At this time, vector quantization after taking the interframe difference, or matrix quantization for a plurality of frames may be performed.
[0023]
The quantized output from the LSP quantizer 34, that is, the index of LSP vector quantization, is taken out via the terminal 31, and the quantized LSP vector or the inverse quantized output is obtained from the LSP interpolation circuit 36 and the LSP → It is sent to the α conversion circuit 38.
[0024]
The LSP interpolation circuit 36 interpolates a set of the previous frame and the current frame of the LSP vector, which is vector-quantized for each frame by the LSP quantizer 34, so as to obtain a rate required for the subsequent processing. In this example, interpolation is performed at a rate of 8 times.
[0025]
In order to perform inverse filtering of the input speech using the LSP vector subjected to such interpolation, the LSP → α conversion circuit 37 converts the LSP parameter into an α parameter which is a coefficient of a direct filter of about 20th order, for example. Convert. The output from the LSP → α conversion circuit 37 is sent to the LPC inverse filter circuit 12 for obtaining the LPC residual, and the LPC inverse filter 12 performs an inverse filtering process with an α parameter updated at a rate of 8 times. To get a smooth output.
[0026]
Further, the LSP coefficient of the 1 × rate from the LSP quantization circuit 34 is sent to the LSP → α conversion circuit 38 and converted into an α parameter, and a bit allocation calculation circuit (bits) for performing the bit allocation as described above. Allocation determination circuit) 41. In addition to the assigned bits, the bit assignment calculation circuit 41 calculates a weight w (ω) used for quantization of MDCT coefficients described later.
[0027]
The output from the LPC inverse filter 12 of the normalization (whitening) circuit unit 11 is sent to the pitch inverse filter 13 and the pitch analysis circuit 15 for pitch prediction which is long-term prediction.
[0028]
Next, long-term prediction will be described. Long-term prediction is performed by subtracting a waveform shifted on the time axis by the pitch period or pitch lag determined by pitch analysis from the original waveform to obtain a pitch prediction residual. In this example, three-point pitch prediction is performed. Is going by. Note that the pitch lag is the number of samples corresponding to the pitch period of the sampled time axis data.
[0029]
That is, the pitch analysis circuit 15 performs pitch analysis at a rate of once per frame, that is, the analysis length is one frame, and the pitch lag in the pitch analysis result is sent to the pitch inverse filter 13 and the bit allocation calculation circuit 41. The pitch gain is sent to the pitch gain quantizer 16. The pitch lag index from the pitch analysis circuit 15 is taken out from the terminal 52 and sent to the decoder side.
[0030]
In the pitch gain quantizer 16, the pitch gain at three points corresponding to the above three-point prediction is vector-quantized, and a codebook index (pitch gain index) is taken out from the output terminal 53 to represent a representative value vector or inverse quantization. The output is sent to the pitch inverse filter 13. The pitch inverse filter 13 outputs a pitch prediction residual obtained by predicting a three-point pitch based on the pitch analysis result. This pitch prediction residual is sent to the division circuit 14 and the envelope extraction circuit 17, respectively.
[0031]
The pitch analysis will be further described. In this pitch analysis, pitch parameters are extracted using the LPC residual. The pitch parameter includes a pitch lag and a pitch gain.
[0032]
First, the pitch lag is determined. For example, 512 samples are cut out from the center of the LPC residual, and x (n) (n = 0 to 511) is obtained.xIs written.xK samples from the past 512 samplesx _k Then, the pitch k is
‖x-Gx _k‖²
Is given as a minimum. That is,
g = (x,x _k) / ‖x _k‖²
As
(x,x _k)²/ ‖x _k‖²
The optimum lag K can be determined by searching for k that maximizes. In the present embodiment, K is 12 ≦ K ≦ 240. This K may be used as it is, or a tracking result using a pitch lag of a past frame may be used. Next, the optimum pitch gain at three points (K, K-1, K + 1) is determined for K determined in this way. That is,
‖x-(G_-1 x _{L + 1}+ G₀ x _L+ G₁ x _L-1) ‖²
To minimize g_-1, G₀, G₁ To obtain a three-point pitch gain for the optimum lag K. These three-point pitch gains are sent to the pitch gain quantizer 16 and are collectively vector quantized, and the pitch inverse filter 13 is constructed using the quantized pitch gain and the optimum lag K, whereby Find the residual. The obtained pitch residual is connected to the already obtained past pitch residual and subjected to MDCT conversion as will be described later. At this time, time-axis gain control may be performed before MDCT conversion.
[0033]
Here, FIG. 3 shows the relationship between the LPC analysis processing and pitch analysis processing with respect to the input signal, and the analysis length of, for example, 1024 samples in one frame FR is a length corresponding to an MDCT conversion block described later. . Time t₁ Is the new LPC analysis center (LSP)₁) And time t₀ LPC analysis center one frame before (LSP₀). The second half of the current frame is new data ND, the first half is previous data PD, and a in the figure is LSP.₀ And LSP₁ B is the LPC residual one frame before, c is the new pitch residual obtained by pitch analysis targeting this part (second half of b + first half of a), d is Each of the past pitch residuals is shown. When all the new data ND in FIG. 3 are input, the data a can be obtained, and a new pitch residual c can be calculated using this a and the already obtained b. By connecting the pitch residual d that has been obtained, one frame of data FR to be orthogonally transformed can be created. The 1-frame FR data can be subjected to orthogonal transform processing such as MDCT.
[0034]
Next, FIG. 4 shows the change of the time axis signal through the LPC inverse filter based on the LPC analysis and the pitch inverse filter based on the pitch analysis, and FIG. 5 shows the result through the LPC inverse filter and the pitch inverse filter. The change on the frequency axis of the signal due to is shown respectively. 4A shows the input signal waveform, and FIG. 5A shows its frequency spectrum. By passing this through an LPC inverse filter based on LPC analysis, the waveform features are extracted and removed. Thus, a substantially periodic pulse-shaped time axis waveform (LPC residual waveform) as shown in FIG. The spectrum on the frequency corresponding to this LPC residual waveform is as shown in FIG. By passing a pitch inverse filter based on the pitch analysis as described above on this LPC residual, the pitch component is extracted and removed, and it is close to white noise as shown in FIG. 5) A time axis signal, and the spectrum on the frequency axis is as shown in FIG.
[0035]
Further, in the embodiment of the present invention, the normalization (whitening) circuit unit 11 smoothes the gain of the intra-frame data. This is because the envelope extraction circuit 17 extracts the envelope from the time-axis waveform in the frame (the residual from the pitch inverse filter 13 in this embodiment), and the extracted envelope is envelope quantized via the switch 19. The time axis waveform (residual from the pitch inverse filter 13) is interrupted by the divider 14 based on the quantized envelope value, thereby obtaining a signal smoothed on the time axis. The signal from the divider 14 is sent as an output of the normalization (whitening) circuit unit 11 to the orthogonal transformation circuit unit 25 in the next stage.
[0036]
By this smoothing, so-called noise shaping can be realized in which the magnitude of the quantization error when the orthogonal transform coefficient after quantization is inversely transformed into a time signal follows the envelope of the original signal.
[0037]
The envelope extraction in the envelope extraction circuit 17 will be described. A signal supplied to the envelope extraction circuit 17, that is, a residual signal normalized by the LPC inverse filter 12 and the pitch inverse filter 13, is represented by x (n), n = 0 to N−1 (N is the above 1). When the number of samples of the frame FR and the orthogonal transform window length (for example, N = 1024) are set, each subblock or subframe cut out by a window having a length M shorter than the transform window length N, for example, M = N / 8 Rms (root mean square) value is used as an envelope. That is, as the rms of each normalized sub-block (sub-frame), the rms of the i-th sub-block (i = 0 to M−1)_i Is defined by the following equation (1).
[0038]
[Expression 1]

[0039]
Rms obtained from the above equation (1)_i For each i, scalar quantization or rms_i Vector quantization can be performed with the whole as one vector. In this embodiment, vector quantization is performed in the envelope quantizer 20, and the index is extracted from the terminal 21 as a parameter for time axis gain control, that is, an envelope index, and transmitted to the decoder side.
[0040]
Rms for each sub-block (sub-frame) quantized in this way_i Qrms_i The input residual signal x (n) is interrupted by the divider 14 based on this value, whereby the signal x smoothed on the time axis_s(n) is obtained. However, rms obtained in this way_i Of these, when the ratio of the largest to the smallest in the frame is a certain value (for example, 4) or more, the above gain control is performed, and a predetermined value is set for quantization of the parameter (the envelope index). Bits (for example, 7 bits), but rms for each sub-block (sub-frame) in the frame_i When the ratio is smaller than the above constant value, normal processing without gain control is performed, and bits for gain control are assigned to quantization of other parameters, for example, frequency axis parameters (orthogonal transform coefficient data) . Whether or not to perform gain control is determined by the gain control on / off determination circuit 18, and the determination output (gain control SW) is used as a switching control signal for the switch 19 on the input side of the envelope quantizer 20. At the same time, it is sent to a coefficient quantization circuit 45 in the coefficient quantization section 40, which will be described later, and used for switching the number of assigned bits of the coefficient when the gain control is on and off. The gain control on / off discrimination output (gain control SW) is taken out via the terminal 22 and sent to the decoder side.
[0041]
Signal x that has been gain-controlled (or gain-compressed) by divider 41 and smoothed on the time axis_s(n) is sent to the orthogonal transform circuit unit 25 as an output of the normalization circuit unit 11, and converted into frequency axis parameters (coefficient data) by, for example, MDCT. The orthogonal transform circuit unit 25 includes a windowing circuit 26 and an MDCT circuit 27. The windowing circuit 26 performs windowing by a window function so that MDCT aliasing cancellation by ½ frame overlap can be used.
[0042]
At the time of decoding on the decoder side, inverse quantization is performed from the quantization index of the transmitted frequency axis parameter (for example, MDCT coefficient), and then the time axis signal is obtained by inverse orthogonal transformation which is frequency axis / time axis transformation. Then, using the above time-axis gain control parameter that has been returned and inversely quantized, the reverse addition (gain expansion or gain restoration process) of overlap smoothing and gain smoothing during encoding is performed. When gain smoothing is used, overlap addition that assumes a window in which the square sum of the values of the windows of the symmetric and overlapping positions is a constant value cannot be used. Is done.
[0043]
That is, FIG. 6 is a diagram showing the state of overlap addition and gain control on the decoder side. In this FIG. 6, w (n), n = 0 to N−1 represents an analysis / synthesis window, g (n) is a time axis gain control parameter, that is,
g (n) = qrms_j (J satisfies jM ≦ n ≦ (j + 1) M)
And g₁(n) is the current frame FR₁ G (n), g₀(n) is one frame past (previous frame FR₀) G (n). In FIG. 6, one frame is divided into eight sub-blocks (sub-frames) SB (M = 8).
[0044]
Previous frame FR₀ For the second half of the data, g for gain control on the encoder side₀After dividing by (n + (N / 2)), the analysis window w ((N / 2) -1-n) for MDCT is applied. Therefore, after the inverse MDCT on the decoder side, the analysis window w ((N / 2) -1-n), that is, the sum P (n) of the principal component and the aliasing component is expressed by the following equation (2).
[0045]
[Expression 2]

[0046]
Also, current frame FR₁ For the first half of the data, on the encoder side, g for gain control₀Since the analysis window w (n) for MDCT is applied after division by (n), the signal obtained by applying the analysis window w (n) again after inverse MDCT on the decoder side, ie, principal component and area The sum Q (n) with the sing component is expressed by the following equation (3).
[0047]
[Equation 3]

[0048]
Therefore, x (n) to be reproduced is obtained as the following equation (4).
[0049]
[Expression 4]

[0050]
By performing such windowing and performing gain control using the rms of each sub-block (sub-frame) as an envelope, a sound with a rapid change in time, for example, a musical sound having a sharp attack or a relatively fast interval between pitch peaks. It is possible to reduce quantization noise, such as a pre-echo, that is easily heard by a sound that is attenuated.
[0051]
Next, MDCT coefficient data obtained by MDCT processing in the MDCT circuit 27 of the orthogonal transform circuit unit 25 is sent to the frame gain normalization circuit 43 and the frame gain calculation / quantization circuit 47 of the coefficient quantization unit 40. In the coefficient quantization unit 40 of the present embodiment, first, the frame gain (block gain) of the entire coefficient of one frame which is the MDCT transform block is calculated and gain normalization is performed. The band is divided into critical bands (critical bands), which are subbands having a wider bandwidth, and a scale factor for each band, a so-called Bark scale factor, is calculated, and thereby normalization is performed again. As the Bark scale factor, the peak value of the coefficient in each band, the root mean square (rms), or the like can be used for each band, and the Bark scale factor of each band is collectively vector quantized. .
[0052]
That is, the frame gain calculation / quantization circuit 47 of the coefficient quantization unit 40 calculates and quantizes the gain for each frame that is the MDCT conversion block, and the codebook index (frame gain index) is passed through the terminal 55. The frame gain of the quantized value is sent to the frame gain normalization circuit 43, and normalization is performed by dividing the input by the frame gain. The output normalized by the frame gain is sent to the Bark scale factor calculation / quantization circuit 42 and the Bark scale factor normalization circuit 44.
[0053]
In the bark scale factor calculation / quantization circuit 42, the bark scale factor for each critical band is calculated and quantized, and a codebook index (bark scale factor index) is taken out via the terminal 54 and sent to the decoder side. At the same time, the Bark scale factor of the quantized value is sent to the bit allocation calculation circuit 41 and the Bark scale factor normalization circuit 44. The Bark scale factor normalization circuit 44 normalizes the coefficients in the band for each critical band, and sends the coefficients normalized by the Bark scale factor to the coefficient quantization circuit 45.
[0054]
The coefficient quantization circuit 45 performs quantization by assigning the number of quantization bits to each coefficient in accordance with the bit assignment information from the bit assignment calculation circuit 41. At this time, the gain control on / off decision circuit 18 outputs The total number of allocated bits is switched according to the gain control SW information. For example, in the case of performing vector quantization, two sets of codebooks for gain control on and off are prepared, and these codebooks are prepared according to the gain control SW information. What is necessary is just to switch.
[0055]
Here, the bit allocation (bit allocation) in the bit allocation calculation circuit 41 will be described. The quantization weight for each MDCT coefficient is calculated based on the LPC coefficient, pitch parameter, bark scale factor, and the like obtained as described above. Thus, the bit allocation of the MDCT coefficients of the entire band is determined and quantization is performed. This weight can be considered as a noise shaping factor, and a desired noise shaping characteristic can be given by changing each parameter. As an example, in this embodiment, as shown in the following equation, the weight W (ω) is calculated using only the LPC coefficient, the pitch parameter, and the Bark scale factor.
[0056]
[Equation 5]

[0057]
As described above, since the weights at the time of quantization are determined only by the LPC, pitch, and bark scale factor, if only these three types of parameters are transmitted to the decoder, the same bit allocation as that of the encoder is reproduced, and one allocation is performed. There is no need to send any information and the rate of side information (auxiliary information) can be lowered.
[0058]
Next, a specific example of quantization in the coefficient quantization circuit 45 will be described with reference to FIGS.
[0059]
FIG. 7 shows an example of a specific configuration of the coefficient quantizing circuit 45 in FIG. 2. The normalized coefficient data (from the Bark scale factor normalizing circuit 44 in FIG. For example, MDCT coefficient (y) is supplied. The weight calculation circuit 2 is substantially equivalent to the bit allocation calculation circuit 41 of FIG. 2, but only the part for calculating the weight of each coefficient for allocating the quantization bit is extracted. In the weight calculation circuit 2, the weight w is calculated based on the parameters such as the LPC coefficient, the pitch parameter, and the Bark scale factor described above. Here, the coefficient for one frame is a vectory1 frame weight vectorwIt shall be expressed as
[0060]
These coefficient vectorsy, Weight vectorwIs sent to the band dividing circuit 3 as necessary to divide it into L (L ≧ 1) bands. As the number of bands, for example, there are about three bands (L = 3) of a low band, a middle band, and a high band (L = 3). However, the number of bands is not limited to this, and the band may not be divided. The coefficient for each band, for example, the coefficient of the kth bandy _kThe weightw _k When (0 ≦ k ≦ L−1),
y= (y ₀,y ₁, ...,y _L-1)
w= (w ₀,w ₁, ...,w _L-1)
It becomes. The number of bands for band division and the number of coefficients for each band are fixed to preset numerical values.
[0061]
Next, the coefficient vector for each bandy ₀,y ₁, ...,y _L-1 Sort circuit 4₀, 4₁, ..., 4_L-1 For each band, the coefficients in the respective bands are ranked according to the order of the weights. This is because the coefficients within each band need only be rearranged (sorted) according to the order of the weights, but only the index (index) indicating the position or order of the coefficients on the frequency axis is sorted in the order of the weights. The precision (number of allocated bits, etc.) at the time of quantizing each coefficient may be determined in correspondence with the sorted index (index). When sorting the coefficients themselves, the coefficient vector for any kth bandy _k The coefficients are sorted in order of weight, and the coefficient vector is sorted in order of weighty'_kGet.
[0062]
FIG. 8 shows the state of this sorting. FIG. 8A shows the weight vector of the k-th band.w _k (B) in FIG. 8 is a coefficient vector of the k-th band.y _k Respectively. In the example of FIG. 8, the number of elements in the kth band is, for example, 8, and the weight vectorw _k 8 weights which are each element of w₁, w₂, ..., w₈ , Coefficient vectory _k 8 coefficients for each element of y₁, y₂, ..., y₈ Respectively. In the example of FIGS. 8A and 8B, the coefficient y_Three The weight w corresponding to_Three Is the largest, and w₂, w₆, ..., w_Four It has become. FIG. 8C shows the coefficient y in the order of the weights.₁, y₂, ..., y₈ Sort (sort) and then y_Three, y₂, y₆, ..., y_Four Coefficient vectory'_kIs shown.
[0063]
Next, the coefficient vector of each band sorted according to the order of weight for each band as described above.y'₀,y'₁, ...,y'_L-1  Each vector quantizer 5₀, 5₁, ..., 5_L-1 To vector quantization. Here, it is preferable to fix the number of allocated bits for each band in advance and prevent the allocation of the number of quantized bits to each band from changing even if the energy for each band changes.
[0064]
Regarding the vector quantization for each band, when the number of elements in one band is large, it is sufficient to divide into several subvectors and perform vector quantization for each subvector. That is, the coefficient vector after sorting of any k-th bandy'_kIs divided into several subvectors according to a predetermined number of elements as shown in FIG. 9, for example, three subvectorsy'_k1,y'_k2,y'_k3 These are vector quantized, and the codebook index c_k1, c_k2, c_k3You can get it. This k-th band index c_k1, c_k2, c_k3Vector of coefficient indexc _k And Here, in the quantization of the subvector, the quantization according to the weight is performed by assigning a larger number of quantization bits to the head side vector. For example, in FIG.y'_k1 8 bits, vectory'_k2 6 bits, vectory'_k3 Are assigned in the order of 8 bits, the number of assigned bits per coefficient decreases in order from the largest, and bit assignment according to the weight can be realized.
[0065]
Next, each vector quantizer 5 in FIG.₀, 5₁, ..., 5_L-1 Vector of coefficient indices for each band fromc ₀,c ₁, ...,c _L-1 Is a vector of coefficient indices for all bands.cAnd is taken out from the terminal 6. This terminal 6 corresponds to the terminal 51 of FIG.
[0066]
In the specific examples of FIGS. 7 to 9, the orthogonally transformed coefficients on the frequency axis (for example, MDCT coefficients) themselves are sorted according to the weights, and the number of assigned bits is large according to the order of the sorted coefficients. An index (index) that indicates the position or order of each coefficient on the frequency axis obtained by orthogonal transformation, although it is reduced (more bits are assigned to higher-order coefficients after sorting). May be sorted in the order of the above weights, and the accuracy (number of allocated bits, etc.) at the time of quantization of each coefficient may be determined corresponding to the sorted index (index). In the specific example described above, vector quantization is used as the coefficient quantization. However, the present invention may be applied to scalar quantization, or quantization using both scalar quantization and vector quantization. Easy.
[0067]
Next, an example of the configuration of an audio signal decoding apparatus (decoder side) corresponding to the audio signal encoding apparatus (encoder side) as shown in FIG. 2 will be described with reference to FIG.
[0068]
10, the data from the output terminals in FIG. 2 are supplied to the input terminals 60 to 67, and the orthogonal transformation coefficients from the output terminal 51 in FIG. 2 are supplied to the input terminal 60 in FIG. An index (for example, MDCT coefficient) is supplied. The LSP index from the output terminal 31 in FIG. 2 is supplied to the input terminal 61, and the data from the output terminals 52 to 55 in FIG. 2, that is, the pitch lag index and the pitch gain index are input to the input terminals 62 to 65. The bark scale factor index and the frame gain index are supplied, respectively, and the envelope index and gain control SW from the output terminals 21 and 22 in FIG. 2 are supplied to the input terminals 66 and 67, respectively.
[0069]
The coefficient index from the input terminal 60 is dequantized by a coefficient dequantization circuit 71 and sent to an inverse orthogonal transform circuit 74 such as IMDCT (inverse MDCT) via a multiplier 73.
[0070]
The LSP index from the input terminal 61 is sent to the inverse quantizer 81 of the LPC parameter reproducing unit 80, dequantized into LSP data, and sent to the LSP → α conversion circuit 82 and the LSP interpolation circuit 83. The α parameter (LPC coefficient) from the LSP → α conversion circuit 82 is sent to the bit allocation circuit 72. The LSP data from the LSP interpolation circuit 83 is converted into an α parameter (LPC coefficient) by the LSP → α conversion circuit 84 and sent to an LPC synthesis circuit 77 described later.
[0071]
In addition to the LPC coefficient from the LSP → α conversion circuit 82, the bit allocation circuit 72 has a pitch lag from the input terminal 62, a pitch gain obtained from the input terminal 63 via the inverse quantizer 91, and an input The bark scale factor obtained from the terminal 64 via the inverse quantizer 92 is supplied, and the same bit allocation as that on the encoder side can be reproduced based only on these parameters. The bit allocation information from the bit allocation circuit 72 is sent to the coefficient inverse quantizer 71 and used to determine the quantized allocation bits for each coefficient.
[0072]
The frame gain index from the input terminal 65 is sent to the frame gain dequantizer 86 and dequantized, and the obtained frame gain is sent to the multiplier 73.
[0073]
The envelope index from the input terminal 66 is sent to the envelope inverse quantizer 88 via the switch 87 and inversely quantized, and the obtained envelope data is sent to the overlap adder circuit 75. The gain control SW information from the input terminal 67 is sent to the coefficient inverse quantizer 71 and the overlap adder circuit 75 and used as a control signal for the switch 87. The coefficient inverse quantizer 71 switches the total number of allocated bits in accordance with the on / off of the gain control as described above. In the case of inverse vector quantization, the code book when the gain control is on. And the codebook at the time of off may be switched.
[0074]
The overlap addition circuit 75 adds the signals returned to the time axis for each frame from the inverse orthogonal transformation circuit 74 such as IMDCT while overlapping each frame by 1/2, and when gain control is on. Then, overlap addition is performed while performing gain control (gain expansion or gain restoration described above) based on envelope data from the envelope inverse quantizer 88.
[0075]
The time axis signal from the overlap addition circuit 75 is sent to the pitch synthesis circuit 76 to restore the pitch component. This corresponds to the reverse processing of the processing by the pitch reverse filter 13 of FIG. 2, and the pitch lag from the terminal 62 and the pitch gain from the inverse quantizer 91 are used.
[0076]
The output from the pitch synthesizing circuit 76 is sent to the LPC synthesizing circuit 77, subjected to LPC synthesizing processing corresponding to the inverse processing of the processing in the LPC inverse filter 12 of FIG.
[0077]
Here, when the coefficient quantizing circuit 45 of the coefficient quantizing unit 40 on the encoder side uses a vector quantizing coefficient sorted for each band according to the weight as shown in FIG. As the inverse quantization circuit 71, a configuration as shown in FIG. 11 can be used.
[0078]
In FIG. 11, an input terminal 60 corresponds to the input terminal 60 of FIG. 10 and is supplied with the coefficient index (a codebook index obtained by quantizing orthogonal transform coefficients such as MDCT coefficients) and weights. The calculation circuit 79 includes an α parameter (LPC coefficient) from the LSP → α conversion circuit 82 in FIG. 10, a pitch lag from the input terminal 62, a pitch gain from the inverse quantizer 91, and a bark scale from the inverse quantizer 92. Factors are supplied. The weight calculation circuit 79 is a component extracted from the bit allocation circuit 72 shown in FIG. 10 until the weight of each coefficient calculated during the calculation of the quantization bit allocation is calculated. In the weight calculation circuit 79, as described above, the weight W (ω) is calculated using only the LPC coefficient, the pitch parameters (pitch lag and pitch gain), and the Bark scale factor by the calculation of the above equation (5). is doing. When the input terminal 93 has an index (index) indicating the position or order of the coefficients on the frequency axis, that is, when there are N coefficient data in the entire band, a numerical value of 0 to N-1 (this is a vector).IIs supplied). It should be noted that the N weights for the N coefficients from the weight calculation circuit 79 are vectorswRepresented by
[0079]
Weight from weight calculation circuit 79wAnd an indicator from the input terminal 93IIs sent to the band dividing circuit 94 and divided into L bands as in the encoder side. For example, if the encoder side is divided into, for example, three bands (L = 3) of a low band, a middle band, and a high band, it is also divided into three bands on the decoder side. The index and the weight for each band obtained by the band division are respectively determined by the sort circuit 95.₀, 95₁, ..., 95_L-1 Sent to. For example, the index in the kth bandI _k And weightw _k Is the kth sort circuit 95_k Sent to. Sort circuit 95_k Then, the index in the kth bandI _k Is the weight of each coefficientw _k Sorted and sorted indicator according to the order ofI'_k  Is output. Each sort circuit 95₀, 95₁, ..., 95_L-1 Indicators sorted by each band fromI ₀,I ₁, ...,I _L-1 Is sent to the coefficient reconstruction circuit 97.
[0080]
Also, the orthogonal transform coefficient index from the input terminal 60 in FIG. 10 is divided into L bands when quantized on the encoder side, as described above with reference to FIGS. The coefficients sorted in the order of weight are obtained by vector quantization for each subvector divided by the number based on a predetermined rule in one band. Specifically, for L bands, a set of coefficient indexes for each band is a vector.c ₀,c ₁, ...,c _L-1 A vector of coefficient indices for each of these bandsc ₀,c ₁, ...,c _L-1 Are respectively inverse quantizers 96.₀, 95₁, ..., 95_L-1 Has been sent to. These inverse quantizers 96₀, 95₁, ..., 95_L-1 The coefficient data obtained by inverse quantization in step 1 is sorted in the order of the weights in each band, that is, each sort circuit 4 in FIG.₀, 4₁, ..., 4_L-1 Coefficient vector fromy'₀,y'₁, ...,y'_L-1  The arrangement order is different from the position on the frequency axis. Therefore, an index representing the position of the coefficient on the time axisIThe coefficient reconfiguration circuit is arranged so that the sorted index and the coefficient data obtained by the inverse quantization are matched and returned to the original order on the time axis. 97 functions. That is, in the coefficient reconstruction circuit 97, each inverse quantizer 96₀, 95₁, ..., 95_L-1 To the coefficient data sorted in order of weight within each band, each sort circuit 95₀, 95₁, ..., 95_L-1 Coefficients sorted in order on the original time axis by associating the sorted indices for each band from and sorting the inverse quantized coefficient data according to the sorted indices (reverse sorting) datayAnd taken out from the output terminal 98. The coefficient data from the output terminal 98 is sent to the multiplier 73 shown in FIG.
[0081]
The present invention is not limited to the above embodiment. For example, the input time axis signal may be an audio signal of a telephone band, a video signal, or the like in addition to an audio signal including voice and music. In addition, the configuration of the normalization circuit unit 11, the LPC analysis, and the pitch analysis are not limited to these, and various configurations for extracting and removing features or correlations of the time axis input waveform by linear prediction or nonlinear prediction or the like. It can take. In addition to vector quantization, each quantizer may use scalar quantization, or both scalar quantization and vector quantization.
[0082]
【The invention's effect】
As is apparent from the above description, according to the present invention, linear encoding is performed on the time axis prior to the orthogonal transform in the signal coding in which the input signal on the time axis is encoded using the orthogonal transform. Since the correlation or characteristic part of the signal waveform is removed based on the results obtained by performing coding (LPC) analysis and pitch analysis, orthogonal transformation is performed on a noise-like residual signal that is almost similar to white noise. Thus, encoding efficiency can be improved.
[0083]
In addition, when encoding the input signal on the time axis using orthogonal transform, the bit allocation at the time of quantization of the coefficient obtained by the orthogonal transform is linear predictive coding of the input signal. By determining based on the result of (LPC) analysis and the result of pitch analysis, the encoder side on the decoder side only sends each parameter of the LPC analysis result and pitch analysis result without sending information only for bit allocation. The same bit allocation can be reproduced, the rate of additional information (side information) can be suppressed, the overall bit rate can be reduced, and the coding efficiency can be improved.
[0084]
Further, by using the improved discrete cosine transform (MDCT) as the orthogonal transform, it is possible to perform highly efficient encoding of an audio signal with good sound quality.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of an embodiment of the present invention.
FIG. 2 is a block diagram showing an audio signal encoding apparatus which is a more specific configuration example of an embodiment of the present invention.
FIG. 3 is a diagram illustrating a relationship between an LPC analysis process and a pitch analysis process for an input signal.
FIG. 4 is a time axis signal waveform diagram for explaining the removal of correlation by LPC analysis and pitch analysis of a time axis input signal.
FIG. 5 is a diagram illustrating frequency characteristics for explaining the removal of correlation by LPC analysis and pitch analysis of a time-axis input signal.
FIG. 6 is a time-axis signal waveform diagram for explaining overlap addition on the decoder side.
FIG. 7 is a block diagram illustrating an example of a specific configuration of a coefficient quantization circuit.
FIG. 8 is a diagram for explaining sorting according to the weights of coefficients in one band obtained by band division;
FIG. 9 is a diagram for describing a process of vector quantization by dividing a coefficient sorted according to weight in one band obtained by band division into sub-vectors.
10 is a block diagram showing an example of an audio signal decoding apparatus as a decoding side configuration corresponding to the audio signal encoding apparatus of FIG. 2;
11 is a block diagram showing a specific example of an inverse quantization circuit of the audio signal decoding device of FIG.
[Explanation of symbols]
11 normalization circuit section, 12 LPC inverse filter, 13 pitch inverse filter, 15 pitch analysis circuit, 16 pitch gain quantization circuit, 17 envelope extraction circuit, 18 gain control ON / OFF decision circuit, 20 envelope quantization circuit, 25 orthogonal Conversion circuit unit, 26 windowing circuit, 27 MDCT circuit, 30 LPC analysis / quantization unit, 32 LPC analysis circuit, 33 α → LSP conversion circuit, 34 LSP quantization circuit, 36 LSP interpolation circuit, 37, 38 LSP → α Conversion circuit, 40 coefficient quantization circuit, 41 bit allocation calculation circuit, 42 Bark scale factor calculation / quantization circuit, 43 frame gain normalization circuit, 44 Bark scale factor normalization circuit, 45 coefficient quantization circuit, 47 frame gain Calculation / Quantum Circuit

Claims

The residual is extracted based on the information obtained by performing linear predictive coding (LPC) analysis and pitch analysis on the input signal on the time axis, and gain smoothing is performed by the quantized envelope value on the time axis. Normalization means for gain control for conversion,
Orthogonal transform means for performing orthogonal transform on the output smoothed on the time axis from the normalization means;
And a quantizing unit that quantizes an output from the orthogonal transform unit.

2. The signal encoding apparatus according to claim 1, wherein the orthogonal transform means converts a time-axis signal input by an improved discrete cosine transform (MDCT) into coefficient data on the frequency axis.

The normalization means is obtained by performing an LPC inverse filter that outputs an LPC prediction residual of the input signal based on an LPC coefficient obtained by LPC analysis of the input signal, and a pitch analysis of the LPC prediction residual. The signal encoding apparatus according to claim 2, further comprising: a pitch inverse filter that removes the correlation of the pitch of the LPC prediction residual based on the pitch parameter.

3. The signal encoding apparatus according to claim 2, wherein the quantization means performs quantization according to the number of assigned bits determined based on the LPC analysis result and the pitch analysis result.

A residual is extracted based on information obtained by performing linear predictive coding (LPC) analysis and pitch analysis on the input signal on the time axis, and smoothed by the quantized envelope value on the time axis ( Normalize by performing gain control)
Apply orthogonal transformation to the output smoothed on this time axis ,
A signal encoding method for quantizing the orthogonally transformed output.

As the orthogonal transformation, improved discrete cosine transform (MDCT) signal encoding method according to claim 5, wherein the Ru used.

Analyzing means for analyzing the input signal on the time axis and extracting the characteristics of the signal waveform;
Normalization means for taking out a residual based on the analysis result from the analysis means, and performing gain control for smoothing the gain by the quantized envelope value on the time axis ,
Orthogonal transform means for performing orthogonal transform on the output smoothed on the time axis from the normalization means;
Quantization means for quantizing the output from the orthogonal transform means;
And a bit allocation calculation unit that determines a bit allocation for quantization in the quantization unit based on an analysis result from the analysis unit.

8. The signal encoding apparatus according to claim 7, wherein the orthogonal transform means converts a time-axis signal input by an improved discrete cosine transform (MDCT) into coefficient data on the frequency axis.

The analysis means includes LPC analysis means for linearly predictive coding (LPC) analysis of the input signal and outputting LPC coefficients, and pitch analysis means for pitch analysis of the LPC prediction residual and outputting pitch parameters. ,
The normalization means includes an LPC inverse filter that outputs an LPC prediction residual of the input signal based on the LPC coefficient from the LPC analysis means, and an LPC prediction residual based on the pitch parameter from the pitch analysis means. 9. A signal encoding apparatus according to claim 8, further comprising a pitch inverse filter for removing pitch correlation.

The bit allocation calculation means is based on the LPC coefficient from the LPC analysis means, the pitch parameter from the pitch analysis means, and the Bark scale factor obtained for each critical band of coefficient output from the orthogonal transform means. The signal encoding apparatus according to claim 9, wherein the bit allocation for quantizing the coefficient output from the orthogonal transform means is determined.

Residuals are extracted based on the analysis results obtained by performing linear predictive coding (LPC) analysis and pitch analysis on the input signal on the time axis, and the gain is determined by the quantized envelope value on the time axis . Normalize by performing gain control for smoothing,
Apply orthogonal transformation to the output smoothed on this time axis ,
A signal encoding method in which the orthogonally transformed output is quantized and bit allocation at the time of the quantization is determined based on the analysis result.

The signal encoding method according to claim 11, wherein an improved discrete cosine transform (MDCT) is used as the orthogonal transform.

12. The signal encoding method according to claim 11, wherein the bit allocation is also determined using a Bark scale factor obtained for each critical band of coefficients obtained by the orthogonal transformation.