JP3557164B2

JP3557164B2 - Audio signal encoding method and program storage medium for executing the method

Info

Publication number: JP3557164B2
Application number: JP2000282129A
Authority: JP
Inventors: 直樹岩上; 岳至森
Original assignee: Nippon Telegraph and Telephone Corp; NTT Inc
Current assignee: Nippon Telegraph and Telephone Corp; NTT Inc
Priority date: 2000-09-18
Filing date: 2000-09-18
Publication date: 2004-08-25
Anticipated expiration: 2020-09-18
Also published as: JP2002091497A

Description

【０００１】
【産業上の利用分野】
この発明は、オーディオ信号を高能率にディジタル符号にする符号化方法及びその方法を実施するプログラムが記録された記憶媒体を提供するものであり、オーディオ信号の録音・再生や、オーディオ信号の通信路を使った伝送、放送などに利用できる。
【０００２】
【従来の技術】
オーディオ信号を高能率に符号化する従来からの手法として、例えば図１に示す変換符号化方法がある。符号化装置１０Ａでは、離散信号サンプル列として入力されたオーディオ信号を、一定サンプル数の入力ごとに時間／周波数変換部１１により時間／周波数変換を行って、周波数領域の一連の係数（以下、周波数領域係数と呼ぶ）にしてから符号化を行う。図１の例では、平坦化部１２により周波数領域係数を平坦化処理を行ってからベクトル量子化部１３で量子化を行ない、平坦化部１２で平坦化に使用した平坦化情報を表す符号とベクトル量子化部１３でのベクトル符号化による符号を多重化部１４で多重化し、出力する。
【０００３】
復号化装置１０Ｂでは、逆多重化部１５で受信した多重化符号を平坦化情報を表す符号とベクトル符号に分離し、ベクトル量子化復号化部１６でベクトル符号から平坦化周波数領域係数を得て、平坦化情報を使って逆平坦化部１７により平坦化周波数領域係数を逆平坦化して周波数領域係数を再生し、周波数／時間変換部１８により再生周波数領域係数を時間領域信号に変換して出力する。
周波数領域重み付けインタリーブベクトル量子化（Ｔｒａｎｓｆｏｒｍ−ｄｏｍａｉｎＷｅｉｇｈｔｅｄＩｎｔｅｒｌｅａｖｅＶｅｃｔｏｒＱｕａｎｔｉｚａｔｉｏｎ，ＴＷＩＮＶＱ）方式はこの例に当てはまる。ＴＷＩＮＶＱ方式は、入力信号を時間／周波数変換した後、２段階の平坦化の手順を経た後に、ベクトル量子化により符号化を行っている。ベクトル量子化は、目標ベクトルとの距離が最小であるコードベクトルをコードブックから選択し、このコードベクトルを用いて復号化装置側でベクトルを再生する。このような符号化方法では、再生ベクトルと目標ベクトルとの距離は小さく高能率に符号化ができるが、ベクトル中の個々の要素についての目標値からの誤差を制御することは困難である。従って、ベクトル量子化を用いた符号化方式では、復号化装置側で再生した信号の周波数特性が、原信号の周波数特性から歪み、再生音質の劣化を招いてしまうことがある。スペクトルの形状が複雑な時、例えば入力信号に強いトーン性成分が含まれるときには、ベクトル量子化に大きな負担をかけるため、この性質が強く現れやすく、音質劣化の原因になってしまう。
【０００４】
この問題を解決するための技術が、特願２０００−０７８３７０「オーディオ信号符号化方法及び復号化方法、これらの装置及びプログラム記憶媒体」において提案されている。ここでは、音源のスペクトルが複雑な形状をしている場合、周波数領域係数を強弱の２系統に分離し、おのおのの系統ごとに重み付けをしたベクトル量子化をすることにより、より細かいサンプルごとの量子化誤差の制御を可能にしている。しかし、この方法では、周波数領域係数の分離情報の符号化に比較的多くのビット数を必要とし、特に強弱の出現確率が均等に近くなってくる低周波数領域では非効率を招くことがある。
【０００５】
別の解決策の手法として、ベクトル量子化に大きな影響を与える特定のサンプルだけ予め取り除いて符号化するものがあり、特願平７−２６１２３６「音響信号変換符号化方法及び復号化方法」及び７−２４８１４５「変換符号化方法及び変換復号化方法」において出願されている。前者の出願では、重要度の高いサンプルを予め取り除き、残りをベクトル量子化しているが、この方法では、取り除くサンプルの周波数軸上での位置の情報を符号化しなくてはならないので符号化の能率が悪い。後者の方法では、スペクトルに周期的なスパイクが生じるピッチ性の音源について、スパイクの基本周波数の整数倍に位置するサンプルを取り除いて別個符号化する。この方法では、符号化能率は良いが、トライアングルなどの非整数倍音構造をもつ音源に対しては効果が薄く、汎用性に乏しい。
【０００６】
なお、ＴＷＩＮＶＱ方式については、岩上他「周波数領域重み付けインタリーブベクトル量子化（ＴＷＩＮＶＱ）による楽音符号化」電子情報通信学会論文誌Ｖｏｌ．Ｊ８０−Ａ，ｐｐ．８３０〜８３７及びＩＳＯ／ＩＥＣ標準１ＳＯ／ＩＥＣ１４４９６−３ＩｎｆｏｒｍａｔｉｏｎＴｅｃｈｎｏｌｏｇｙ：ＣｏｄｉｎｇｏｆＡｕｄｉｏ−ＶｉｓｕａｌＯｂｊｅｃｔｓ（ＭＰＥＧ−４Ａｕｄｉｏ）に詳細が記述されている。また、ベクトル量子化技術全般に関する詳細は、古井他著「ベクトル量子化と情報圧縮」（コロナ社、１９９８）に述べられている。
【０００７】
【発明が解決しようとする課題】
この発明は、入力信号のスペクトルを符号化するオーディオ信号の変換符号化方法において、複雑な形状を持つスペクトルを、高い汎用性で能率よく符号化する方法を提供することを課題とする。
【０００８】
【課題を解決するための手段】
符号器では、入力された信号の系列を一定時間ごとに周波数領域に変換し、得られた周波数領域の係数を、近傍同士をまとめた小帯域に分割する。小帯域ごとに、係数を符号化するモデルを決定し、その決定された方法で小帯域に属する係数を量子化及び符号化する。
【０００９】
【作用】
例えば、符号化のモデルとしてベクトル量子化を用いた方法とスカラ量子化を用いた方法の２種類を持つ場合、ベクトル量子化は、少ないビット数でも高能率に量子化ができる反面、サンプルごとの歪みの制御が困難である。スカラ量子化は、サンプルごとの独立性は良いが、量子化ビット数が足りないと能率が大幅に劣化してしまう。この発明では、このように、相異なる性質を持つ符号化モデルを複数持ち、入力スペクトルの各小帯域ごとに最適なモデルを選択することにより、汎用性が高く高能率な符号化を可能にしている。
【００１０】
【発明の実施の形態】
図２に本発明の第１の実施例を示す。図２の実施例において、符号化装置１０Ａは時間／周波数変換部１０と、符号化モデル選択部２０と、量子化・符号化部３０と、多重化部４０とにより構成され、例えば、オーディオ信号などの離散サンプル列を端子９に入力して、符号化したビット列を多重化部４０から出力する。即ち、入力信号は時間／周波数変換部１０において例えば変形離散コサイン変換（ＭＤＣＴ）により周波数領域係数に変換され、この周波数領域係数は符号化モデル選択部２０において一連の周波数領域係数を複数の係数毎に区切った小帯域毎に予め決めた複数の符号化方法のどれを使用して符号化を行うかが選択指定され、その選択情報を量子化・符号化部３０に与えると共に、選択情報の符号を多重化部４０に与える。周波数領域係数は量子化・符号化部３０において選択情報により指定された符号化方法により符号化され、その周波数領域係数の符号（以下、係数符号と呼ぶ）が多重化部４０に与えられる。多重化部４０は選択情報符号と係数符号を多重化し、ビット列として出力する。
【００１１】
復号化装置１０Ｂは逆多重化部５０と、符号化モデルの選択情報再生部６０と、係数再生部７０と、周波数／時間変換部８０とから構成され、入力された多重化符号ビット列を復号し、時間領域の離散サンプル列であるオーディオ信号を出力端子９１から出力する。即ち、入力された多重化符号ビット列は逆多重化部５０において符号化モデルの選択情報符号と係数符号とに分離され、選択情報符号は復号化モデル選択再生部６０に与えられて選択情報に再生され、その選択情報は係数再生部７０に与えられる。逆多重化部５０からの係数符号は係数再生部７０に与えられ、符号化装置１０Ａにおける符号化モデル選択部２０により各小帯域に対し選択された符号化方法に対応する復号化方法が選択情報により選択され、係数符号に対する復号化が行われ、周波数領域係数が再生される。この周波数領域係数は周波数／時間変換部８０により時間領域のサンプル系列に変換され、復号結果として端子９１に出力される。
【００１２】
次に符号化装置１０Ａの各部について図５に示す周波数領域係数に対する処理を参照して詳細に説明する。
時間／周波数変換部１０
端子９に入力されたオーディオ信号の離散サンプル列は、時間／周波数変換部１０に入力され、一定数Ｎの入力サンプル（１フレームとする）ごとに時間／周波数変換を行い、周波数領域のＮ個の係数に変換する。時間／周波数変換の方法としては、離散コサイン変換（ＤＣＴ）や、変形離散コサイン変換（ＭＤＣＴ）を用いることができる。図５に示す例では、各丸印が１つの係数値を表し、ここでは２チャネルオーディオ信号がフレーム毎に交互に周波数領域係数に変換される場合を示している。
【００１３】
変換方法として変形離散コサイン変換を用いる場合には、Ｎ個の入力ごとに過去２×Ｎ個の入力オーディオサンプルを変換してＮ個の周波数領域の係数を得る。時間／周波数変換処理を行う直前にハミング窓やハニング窓などの窓関数をかけても良い。Ｎの値は、時間／周波数変換アルゴリズムに適用できるどのようなものを適用しても良いが、１２８から２０４８の間の値を使うと最も効果が高い。また、入力信号の性質に応じて適応的にＮの値を切り替えても良い。例えば、通常時Ｎ＝２０４８としておき、入力音が過渡的な時Ｎ＝５１２，更に大きく過渡的であったときにはＮ＝１２８としても良い。適応的に切り替える場合、次のフレームのＮの値を符号化して復号化装置に送る。
符号化モデル選択部２０
時間／周波数変換部１０で得られた周波数領域での一連の係数は、符号化モデル選択部２０へ送られる。符号化モデル選択部２０では、一連の周波数領域係数を例えば図５に示すように複数の係数毎にまとめることによって構成された小帯域ごとに、予め決めた複数の符号化方法のうち、どの符号化方法を用いるのが最適かを判断し、その符号化方法を選択する選択情報を出力する。図５に示す各小帯域に付けられた番号ｎ（ｃ）（ただしｎ＝０，…，Ｎ−１及びｃ＝０，１）の括弧内の番号ｃはチャネル番号を表している。この実施例では、符号化モデルとして、ベクトル量子化を用いるタイプとスカラ量子化を用いるタイプの２つが選択できる場合を示している。それぞれの符号化モデルの詳細は、後述する。
【００１４】
小帯域を構成する周波数領域係数の数は、一定数でも良いし、その小帯域が属する周波数によって変化させても良い。後者の場合には、周波数の低い領域では係数の数を少なく、周波数の高い領域では係数の数を多くとると、後段での符号化を能率よく行うことができる。
符号化モデル選択部２０におけるベクトル量子化タイプとスカラ量子化タイプの選択は、その小帯域の性質を利用して判断すれば良い。図３Ａ，３Ｂ，３Ｃ，３Ｄのそれぞれに選択アルゴリズムの例を示す。例えば図３Ａに示すように、各小帯域に含まれるＭ個の周波数領域係数ｘ_ｉ，ｉ＝０，…，Ｍ−１の平坦度Ｓを次式
【００１５】
【数１】

のように計算し（ステップＳ１１）、平坦度Ｓを予め決めた閾値Ｓ_ｔｈと比較し（ステップＳ１２）、ＳがＳ_ｔｈよりより小さいとき、例えば閾値Ｓ_ｔｈ＝０．５を下回るときにはスカラ量子化タイプを選択し（ステップＳ１３）、そうでない時にはベクトル量子化タイプを選択し（ステップＳ１４）、各小帯域に対し選択されたタイプを表す選択情報を出力する（ステップＳ１５）。
【００１６】
あるいは、図３Ｂに示すように、小帯域に含まれるＭ個の周波数領域係数ｘ_ｉを、その大きさで強弱の２系統ｘ_ｊ，ｊ＝０，…，Ｊ−１及びｘ_ｋ，ｋ＝０，…，Ｋ−１（ただしｊ≠ｋ，Ｊ＋Ｋ＝Ｍ）に分け（ステップＳ２１）、それぞれの系統に属する係数のパワーＰ_Ｊ＝Σ｜ｘ_ｊ｜^２とＰ_Ｋ＝Σ｜ｘ_ｋ｜^２の比Ｐ_Ｊ／Ｐ_Ｋを計算し（ステップＳ２２）、このパワーの比を一定の値Ｃと比較し（ステップＳ２３）、例えばＣ＝２．０を超えた場合にはスカラ量子化タイプを選択し（ステップＳ２４）、そうでないときにはベクトル量子化タイプを選択し（ステップＳ２５）、各小帯域に対し決定した選択情報を出力する（ステップＳ２６）。
【００１７】
あるいは、各入力フレームに対して得られた周波数領域係数の一連の小帯域のうち、低次側（低周波数側）の小帯域はスカラ量子化タイプ、高次側（高周波数側）の小帯域はベクトル量子化タイプで符号化を行うことを前提とし、低次側小帯域に対するスカラ量子化に必要なビット数が予め決めたある一定数に達する小帯域を、ベクトル量子化との切り替わりの点とする方法を用いても良い。即ち、図３Ｃに示すように、ｎ＝０，Ｂ＝０を初期値とし（ステップＳ３１）、ｎ番目の小帯域ｎ（ｃ）に対しスカラ量子化を選択し、予め決めた量子化精度を満たすスカラ量子化ビット数ｂ_ｎを割り当て（ステップＳ３２）、割り当てたビット数ｂ_ｎを累積し（ステップＳ３３）、割り当てビット数の累積値Ｂが所定値Ｂ_Ｓより小さいか判定し（ステップＳ３４）、小さければｎを１増加させステップＳ３２に戻り（ステップＳ３５）、ＢがＢ_Ｓ以上であれば残りの小帯域に対しベクトル量子化を選択する（ステップＳ３６）。この図３Ｃの例では、選択情報を送る替わりにどの周波数又はどの小帯域から符号化方法が変わるかを表す情報を送ればよい。
【００１８】
更に、小帯域ごとにどちらの符号化モデルを用いるか予め決めておいても良い。例えば図３Ｄに示すように、周波数F=4kHzよりも低い周波数に属する小帯域に対しスカラ量子化タイプを選択し（ステップＳ４１）、それより高い周波数に属する小帯域に対してはベクトル量子化タイプを選択する（ステップＳ４２）と予め決めても良い。この場合は、選択情報を復号化装置に送る必要はない。
なお、上記の選択方法のうち複数の方法を組み合わせても良い。例えば、図３ＤのステップＳ４１に従って2kHzよりも低い周波数に属する小帯域に対しては必ずスカラ量子化タイプで行うこととし、2kHzから7kHzの間に属する小帯域に対しては、上記図３Ｂの方法により量子化方法を選択し、7kHzよりも高い周波数に属する小帯域に対しては図３ＤのステップＳ４２に従って必ずベクトル量子化タイプを選択することとする、などの方法で決定しても良い。図３Ａ及び３Ｂの場合は、各小帯域に対し、２つの符号化モデルのいずれの一方が適用されるかを指定する選択情報が生成される。
【００１９】
このようにして決められた符号化モデルの選択情報は量子化・符号化部３０に送るとともに、符号化されて選択情報符号ＱＭＳＣとして多重化部４０に送られる。この実施例では、選択できる符号化モデルは２種類であるため、選択情報は各小帯域ごとに高々１ビットあれば符号化できる。もちろん、符号化モデルは２種類より多くてもよい。また、エントロピー符号化やランレングス符号化などを用いて、選択情報を可逆圧縮しても良い。
量子化・符号化部３０
量子化・符号化部３０では、符号化モデル選択部２０で決定された符号化方法を用いて、小帯域ごとに量子化及び符号化を行う。
【００２０】
図４に量子化・符号化部３０の詳細を示す。量子化・符号化部３０は重み計算部３１と、係数振り分け部３２と、重み振り分け部３３と、ベクトル量子化部３４と、スカラ量子化部３５と、可逆圧縮部３６とから構成されている。重み計算部３１では、入力された各小帯域の周波数領域係数に対し、量子化の重みを計算する。重みとしては、線形予測スペクトルを用いる方法、小帯域ごとに平均値あるいは最大値を求めこれを重みとする方法、あるいは、その組み合わせなどを用いることができる。
【００２１】
重み計算部３１で計算された重みは、重み振り分け部３３に送られる。また、重みは、復号化装置１０Ｂでも使用されるため、符号化して重み符号ＷＣを他の符号と共に多重化部４０を介して復号化装置１０Ｂに送られる。線形予測スペクトルは、線形予測係数をＬＳＰ係数に変換し符号化することにより、高能率に符号化することができる。また、小帯域の代表値は、その値を量子化することにより符号化できる。量子化の方法としては、ベクトル量子化を用いても良いし、スカラ量子化しても良い。量子化した量子化インデックスはエントロピー符号化などの非可逆圧縮を加えて符号化しても良い。圧縮しない場合の符号は、量子化インデックスを２進数で表すことにより得られる。
【００２２】
係数振り分け部３２では、小帯域ごとの周波数領域係数を入力とし、符号化モデル選択部２０より送られた符号化モデル選択情報に基づき、小帯域ごとに係数をベクトル量子化部３４とスカラ量子化部３５に振り分ける。符号化モデル情報は、小帯域ごとに定められた２値の値なので、この値に従って対応する符号化モデルに係数を振り分ければよい。図５の例では、選択情報に従ってチャネル０の小帯域０（０）〜３（０）はスカラ量子化に振り分けられ、小帯域４（０），５（０）はベクトル量子化に振り分けられ、また、チャネル１の小帯域０（１），１（１），２（１），４（１）がスカラ量子化に振り分けられ、小帯域３（１），５（１）がベクトル量子化に振り分けられた場合を示している。
【００２３】
重み振り分け部３３では、係数振り分け部３２と同じ方法で重みを振り分ける。ベクトル量子化部３４では、係数振り分け部３２及び重み振り分け部３３より送られた重みを使って係数振り分け部３２からの係数をベクトル量子化する。量子化に先立ち、入力された係数をまとめ１個以上の量子化ユニットを構成する。量子化ユニットは、全係数を全て格納して１つだけ構成しても良いし、図５で示した例のステレオ符号化の場合には、ベクトル量子化ユニットＶＱ−ＣＵ０で示すようにチャネルごとに１つの量子化ユニットを構成しても良い。また、係数を一定数ごとに分割し、多数の量子化ユニットを構成しても良い。
【００２４】
ベクトル量子化は、量子化ユニットごとに行われるが、量子化ユニットを一括してベクトル量子化しても良いし、分割してベクトル量子化しても良い。分割の方法としては、複数の領域に分ける方法、あるいは入力係数をインタリーブしてから分割する方法などを用いることができる。最適ベクトルの選択は、コードブック中のコードベクトルに重み振り分け部３３からの重みを乗算し、目標となる係数ベクトルに最も近くなるものを選択することにより行う。ベクトル量子化の形態としては、通常のベクトル量子化の他、２つのコードブックから選択したコードベクトルの和を用いる共役構造ベクトル量子化、あるいは多段ベクトル量子化などの形態を用いても良い。このようにして決定されたコードベクトルインデックスを２進数で表すことにより符号化を行い、ベクトル量子化符号ＶＱＣを多重化部４０に送る。
【００２５】
スカラ量子化部３５では、重み振り分け部３３からの重みを使って係数振り分け部３２からの小帯域ごとの周波数領域係数をスカラ量子化し、量子化インデックスは可逆圧縮部３６へ送られる。量子化に先立ち、入力された係数をまとめ１個以上の量子化ユニットを構成する。１小帯域ごとに１ユニットを構成すると良好な結果が得られる。量子化値は、重み振り分け部３２からの重みを掛け合わせて周波数領域係数と最も近づくような２進値を所望の量子化精度で決定する。この量子化の際に、量子化精度（例えばビット数）を決定する必要があるが、これは量子化ユニットごとに設定する。周波数領域係数のパワーとスペクトル形状に基づき決定された、最低限保証する必要がある量子化誤差から決定することが望ましい。
【００２６】
ここで決定された量子化精度情報も何らかの形で符号化し復号器に送る必要がある。最も簡単な方法としては、量子化精度を満たすのに必要な量子化ビット数を符号化することが挙げられる。その他、可逆圧縮部３６でエントロピー符号化を行う場合には、ハフマン符号化ならハフマン符号テーブル、算術符号化ならシンボルの出現頻度テーブルにより量子化精度を与えることができるので、このテーブルの種類を符号化することにより量子化精度情報を復号器に送ることができる。
【００２７】
なお、前述のベクトル量子化における重み付け及び上述のスカラ量子化における重み付けは、図１で説明した平坦化部１２による平坦化と本質的に同じである。即ち、図４に示したベクトル量子化におけるコードベクトル又はスカラ量子化における量子化値に重みを乗算する代わりに、周波数領域係数を重みで割り算し（即ち、平坦化し）、その割り算結果をベクトル量子化又はスカラ量子化しても処理手順が異なるだけで符号化結果は同じである。
【００２８】
可逆圧縮部３６では、スカラ量子化部３５で得られた量子化インデックスに可逆圧縮符号化を行ない、圧縮スカラ量子化符号CSQCを多重化部４０に供給する。可逆圧縮符号化に先立ち、１つ以上の量子化ユニットをまとめ、符号化ユニットを構成する。図５の例では、チャネル０の処理において２つの量子化ユニットSQU0，SQU1をまとめて１つの符号化ユニットSQCU0とし、２つの量子化ユニットSQU2，SQU3をまとめて１つの符号化ユニットSQCU1とし、チャネル１についても同様の処理を行っている。
【００２９】
符号化ユニットは量子化ユニットを一定数ずつまとめて構成しても良いし、似た量子化精度をもつ量子化ユニット同士をまとめて符号化ユニットを構成しても良い。後者の場合、符号化ユニットの構成情報を符号化して復号器に送る必要がある。可逆圧縮の方法としては、ハフマン符号化や算術符号化などのエントロピー符号化の他、量子化値０が長く続く場合には、ランレングス符号化を用いても効果がある。エントロピー符号化を行う場合、ハフマン符号化なら、ハフマン符号テーブル、算術符号化ならシンボルの出現頻度のテーブルを与える必要がある。このテーブルは符号化ユニットごとに与える。
【００３０】
多重化部４０は符号化モデル選択情報符号QMSC、重み符号WC、ベクトル量子化符号VQC 、圧縮スカラ量子化符号CSQCを多重化し、多重化符号ビット列として出力し、例えば記憶媒体に書き込んだり、あるいは他の装置に送信する。
次にこの発明の符号化方法により符号化された符号を復号する復号化装置１０Ｂについて説明する。
図２に示したように、入力された多重化符号ビット列は逆多重化部５０において分離され、選択情報QMSC、重み符号WC、ベクトル量子化符号VQC 、圧縮スカラ量子化符号CSQCを得る。選択情報QMSCは符号化モデル選択再生部６０において選択情報に再生され、係数再生部７０に与えられる。
符号化モデル選択再生部６０
符号化モデル選択再生部６０では、符号化モデル選択符号QMSCのビット列が入力され、符号化モデル選択情報が再生される。符号化モデル選択情報が可逆圧縮されている場合、圧縮方法に対する復号化を行い、符号化モデル選択の２値情報を得る。可逆圧縮がかけられていない場合には、ビット列を２値の整数化して、符号化モデルの選択情報とする。このようにして得られた符号化モデルの選択情報は、係数再生部７０に送られる。
係数再生部７０
係数再生部７０では、逆多重化部５０から、符号化装置１０Ａにおける量子化・符号化部３０の符号化出力である係数符号（ベクトル量子化符号VQC と、圧縮スカラ量子化符号CSQCと、重み符号WC）と選択情報が与えられ、選択情報により指定された符号化方法に対応する復号化を行って周波数領域係数を再生し、周波数／時間変換部８０に与える。
【００３１】
図６に係数再生部７０の詳細を示す。係数再生部７０は可逆圧縮復号化部７１と、スカラ量子化再生部７２と、周波数領域再構築部７３と、ベクトル量子化再生部７４と、重み再生部７５と、重み付け部７６とから構成されている。
逆多重化部５０により分離された符号ビット列のうち、ベクトル量子化符号ＶＱＣのビット列は、ベクトル量子化再生部７４において、２進数を整数表現することにより量子化インデックスに復元され、コードブックを参照して対応するベクトルを読み出し、そのベクトルを重みなし小帯域係数として再生する。符号化装置１０Ａにおけるベクトル量子化部３４を複数のベクトル量子化により構成した場合には、符号化装置１０Ａのベクトル量子化部３４と同じ規則を用いて再生されたベクトルをベクトル逆量子化により再構築し、重みなし小帯域係数を得る。再生された重みなし小帯域係数は、周波数領域再構築部７３に送られる。
【００３２】
逆多重化部５０からの圧縮スカラ量子化符号ＣＳＱＣのビット列は、可逆圧縮復号化部７１において、符号化装置側で可逆圧縮符号化した手法に対応する復号化を行うことによりスカラ量子化インデックスを得、スカラ量子化再生部７２に送られる。
スカラ量子化再生部７２では、可逆圧縮復号化部７１より受け取ったスカラ量子化インデックスを量子化ユニットごとにスカラ逆量子化して量子化値に復元することにより重みなし小帯域係数を再生する。再生した重みなし小帯域係数は、周波数領域再構築部７３に送られる。
【００３３】
周波数領域再構築部７３では、スカラ量子化再生部７２及びベクトル量子化再生部７３より送られた量子化ユニットごとの重みなし小帯域係数を、符号化モデル選択情報再生部６０より受け取った符号化モデル選択情報に従って重みなし周波数領域係数に再構築する。
重み再生部７５では、逆多重化部５０から重み符号ＷＣのビット列を受け取り、重みを再生する。重み付け部７６では、周波数領域再構築部７３で構築した重みなし周波数領域係数に重み再生部７５で得た重みを乗算して周波数領域係数を得る。
周波数／時間変換部８０
周波数／時間変換部８０では、整数再生部７０からの周波数領域係数に対し周波数／時間変換を行いオーディオ信号を出力する。周波数／時間変換の方法としては、逆離散コサイン変換（ＩＤＣＴ）や、逆変形離散コサイン変換（ＩＭＤＣＴ）を用いることができる。変換方法として逆変形離散コサイン変換を用いる場合には、Ｎ個の入力係数を変換して２Ｎ個の時間領域のサンプルを得る。このサンプルに、窓関数を掛けた後、現フレームの前半Ｎサンプルと一つ前のフレームの後半Ｎサンプル同士を加え合わせて得られたＮサンプルを出力とする。
【００３４】
Ｎの値は、時間／周波数変換アルゴリズムに適用できるどのようなものを適用しても良いが、１２８から２０４８の間の値を使うと最も効果が高い。また、符号器で入力信号の性質に応じて適応的にＮの値を切り替えた場合、例えば、通常時Ｎ＝２０４８としておき、入力音が過渡的な時Ｎ＝５１２、更に大きく過渡的であったときにはＮ＝１２８とした場合、符号化装置から渡されたＮの情報に従ってＮの値を決定する。
【００３５】
図７はこの発明による符号化方法及びこの符号化方法により符号化された符号を復号する復号化方法をコンピュータで実施する場合の構成を示し、コンピュータ１００は、バス１８０を介して互いに接続されたＣＰＵ１１０，ＲＡＭ１２０、ＲＯＭ１３０，入出力インタフェース１４０、ハードディスク１５０を含んでいる。ＲＯＭ１３０にはコンピュータ１００を動作させる基本プログラムが格納されており、ハードディスク１５０には前述したこの発明による符号化方法及びこの符号化方法により符号化された符号を復号する復号化方法を実行するプログラムが予め格納されている。
【００３６】
例えば符号化時にはＣＰＵ１１０はハードディスク１５０から符号化プログラムをＲＡＭ１２０にロードし、インタフェース１４０から入力されたオーディオ信号サンプルを符号化プログラムに従って処理することにより符号化し、インタフェース１４０から出力する。復号時には、復号プログラムをハードディスク１５０からＲＡＭ１２０にロードし、入力符号を復号プログラムに従って処理してオーディオ信号サンプルを出力する。
【００３７】
この発明による符号化方法及びこの符号化方法により符号化された符号を復号する復号化方法を実行するプログラムは、内部バス１８０に駆動装置１６０を介して接続された外部ディスク装置１７０に記録されたものを使用してもよい。あるいは、インタフェース１４０を介して外部ネットワークからプログラムをダウンロードしてハードディスク１５０に格納したものでもよい。この発明による符号化方法を実行するプログラムが記録された記憶媒体としては、磁気記録媒体や、ＩＣメモリや、コンパクトディスクなどのような形態の記憶媒体であってもよい。
【００３８】
【発明の効果】
本発明を利用すると、低いビットレートでのオーディオ信号の符号化において、入力音の特性に適応した高能率な符号化を可能とする。
【図面の簡単な説明】
【図１】ベクトル量子化利用の変換符号化方法の一般的な形態を示すブロック図。
【図２】本発明の実施例の構成を示すブロック図。
【図３】Ａは符号化モデル選択アルゴリズムの一例を示すフロー図、Ｂは他の選択アルゴリズムを示すフロー図、Ｃは更に他の選択アルゴリズムを示すフロー図、Ｄは更に他の選択アルゴリズムを示すフロー図。
【図４】本発明の実施例中の量子化・符号化部の詳細な構成を示すブロック図。
【図５】入力周波数領域係数と、小帯域と、符号化モデル選択と、量子化ユニットと、符号化ユニットの構成例を示す図。
【図６】本発明の実施例中の、係数再生部の詳細な構成を示すブロック図。
【図７】この発明の符号化方法及びこの符号化方法により符号化された符号を復号する復号化方法をプログラムにより実施するためのコンピュータの構成を示すブロック図。[0001]
[Industrial applications]
The present invention relates to an encoding method and a coding method for converting an audio signal into a digital code with high efficiency.WhoThe present invention provides a storage medium on which a program for implementing the method is recorded, and can be used for recording / reproducing of an audio signal, transmission of an audio signal using a communication channel, broadcasting, and the like.
[0002]
[Prior art]
As a conventional method of encoding an audio signal with high efficiency, for example, there is a transform encoding method shown in FIG. In the encoding device 10A, the audio signal input as a discrete signal sample sequence is subjected to time / frequency conversion by the time / frequency conversion unit 11 every time a fixed number of samples are input, and a series of coefficients in the frequency domain (hereinafter, referred to as frequency (Referred to as area coefficient) before encoding. In the example of FIG. 1, the frequency domain coefficients are flattened by the flattening unit 12, then quantized by the vector quantization unit 13, and codes representing the flattening information used for flattening by the flattening unit 12. The code by the vector encoding in the vector quantization unit 13 is multiplexed in the multiplexing unit 14 and output.
[0003]
In the decoding device 10B, the multiplexed code received by the demultiplexing unit 15 is separated into a code representing flattening information and a vector code, and the vector quantization decoding unit 16 obtains a flattened frequency domain coefficient from the vector code. Using the flattening information, the flattening frequency domain coefficient is inverse-flattened by the inverse flattening unit 17 to reproduce the frequency domain coefficient, and the frequency / time conversion unit 18 converts the reproduced frequency domain coefficient into a time domain signal and outputs it. I do.
A frequency-domain weighted interleave vector quantization (Transform-domain Weighted Interleave Vector Quantization, TWINVQ) scheme applies to this example. In the TWINVQ method, after an input signal is subjected to time / frequency conversion, it is subjected to a two-stage flattening procedure, and then is encoded by vector quantization. In the vector quantization, a code vector having a minimum distance from a target vector is selected from a code book, and the decoding apparatus uses the code vector to reproduce the vector. In such an encoding method, although the distance between the reproduction vector and the target vector is small and encoding can be performed with high efficiency, it is difficult to control the error of each element in the vector from the target value. Therefore, in an encoding method using vector quantization, the frequency characteristics of a signal reproduced on the decoding device side may be distorted from the frequency characteristics of the original signal, resulting in deterioration of reproduced sound quality. When the shape of the spectrum is complicated, for example, when an input signal contains a strong tone component, a large load is imposed on the vector quantization, and this property is likely to appear strongly, which causes deterioration in sound quality.
[0004]
A technique for solving this problem has been proposed in Japanese Patent Application No. 2000-078370, "Audio signal encoding method and decoding method, devices and program storage medium". Here, when the spectrum of the sound source has a complex shape, the frequency domain coefficients are separated into two strong and weak systems, and weighted vector quantization is performed for each system, so that finer quantum The control of the formation error is enabled. However, this method requires a relatively large number of bits for encoding the separation information of the frequency domain coefficients, and may cause inefficiency especially in a low frequency domain in which the occurrence probabilities of the strengths become nearly equal.
[0005]
As another solution method, there is a method in which only a specific sample that has a large effect on vector quantization is removed in advance and encoded. Japanese Patent Application No. Hei 7-261236 “Audio signal conversion encoding method and decoding method” and 7 -248145 "Transform coding method and transform decoding method". In the former application, samples of high importance are removed in advance, and the rest are vector-quantized.However, in this method, information on the position of the samples to be removed on the frequency axis must be encoded. Is bad. In the latter method, for a pitch-type sound source in which a periodic spike occurs in the spectrum, a sample located at an integer multiple of the fundamental frequency of the spike is removed and separately encoded. This method has good coding efficiency, but has little effect on a sound source having a non-integer overtone structure such as a triangle, and is poor in versatility.
[0006]
The TWINVQ method is described in Iwagami et al., "Tone Coding by Frequency Domain Weighted Interleave Vector Quantization (TWINVQ)," IEICE Transactions Vol. J80-A, pp. Details are described in 830-837 and ISO / IEC standard ISO / IEC 14496-3 Information Technology: Coding of Audio-Visual Objects (MPEG-4 Audio). Further, details regarding the vector quantization technology in general are described in "Vector quantization and information compression" by Corui et al. (Corona Corp., 1998).
[0007]
[Problems to be solved by the invention]
The present invention relates to a method for transforming and encoding an audio signal, which encodes a spectrum of an input signal, in which a spectrum having a complicated shape is efficiently encoded with high versatility.The lawThe task is to provide.
[0008]
[Means for Solving the Problems]
The encoder converts the input signal sequence into the frequency domain at regular time intervals, and divides the obtained frequency domain coefficients into small bands in which neighbors are combined. For each small band, determine a model for coding the coefficient, and quantize and code the coefficient belonging to the small band in the determined method..
[0009]
[Action]
For example, when there are two types of coding models, a method using vector quantization and a method using scalar quantization, vector quantization can quantify efficiently with a small number of bits, but on the other hand, It is difficult to control distortion. The scalar quantization has good independence for each sample, but if the number of quantization bits is insufficient, the efficiency is greatly deteriorated. According to the present invention, as described above, by having a plurality of encoding models having different properties and selecting an optimal model for each small band of the input spectrum, it is possible to perform highly versatile and highly efficient encoding. I have.
[0010]
BEST MODE FOR CARRYING OUT THE INVENTION
FIG. 2 shows a first embodiment of the present invention. In the embodiment of FIG. 2, the encoding device 10A includes a time / frequency conversion unit 10, an encoding model selection unit 20, a quantization / encoding unit 30, and a multiplexing unit 40. And the like, is input to the terminal 9, and the coded bit sequence is output from the multiplexing unit 40. That is, the input signal is converted by the time / frequency converter 10 into frequency domain coefficients by, for example, a modified discrete cosine transform (MDCT), and the frequency domain coefficients are converted by the coding model selector 20 into a series of frequency domain coefficients for each of a plurality of coefficients. Which of a plurality of predetermined encoding methods is to be used for encoding is selected and designated for each small band divided into subbands, and the selected information is given to the quantization / encoding unit 30 and the encoding of the selected information is performed. To the multiplexing unit 40. The frequency domain coefficients are encoded by the encoding method specified by the selection information in the quantization / encoding section 30, and the code of the frequency domain coefficient (hereinafter, referred to as a coefficient code) is provided to the multiplexing section 40. The multiplexing unit 40 multiplexes the selection information code and the coefficient code and outputs the result as a bit string.
[0011]
The decoding device 10B includes a demultiplexing unit 50, a coding model selection information reproduction unit 60, a coefficient reproduction unit 70, and a frequency / time conversion unit 80, and decodes an input multiplexed code bit sequence. , An audio signal which is a sequence of discrete samples in the time domain is output from the output terminal 91. That is, the input multiplexed code bit string is separated into the selection information code and coefficient code of the coding model in the demultiplexing unit 50, and the selection information code is supplied to the decoding model selection / reproduction unit 60 and reproduced into the selection information. The selection information is provided to the coefficient reproducing unit 70. The coefficient code from the demultiplexing unit 50 is provided to the coefficient reproducing unit 70, and the decoding method corresponding to the encoding method selected for each small band by the encoding model selecting unit 20 in the encoding device 10A is selected information. , Decoding of the coefficient code is performed, and the frequency domain coefficient is reproduced. The frequency-domain coefficients are converted into a time-domain sample sequence by the frequency / time conversion unit 80 and output to the terminal 91 as a decoding result.
[0012]
Next, each unit of the encoding device 10A will be described in detail with reference to the processing on the frequency domain coefficients shown in FIG.
Time / frequency converter 10
The discrete sample sequence of the audio signal input to the terminal 9 is input to the time / frequency conversion unit 10 and performs time / frequency conversion for each of a fixed number N of input samples (assuming one frame). To the coefficient. As a method of the time / frequency transform, a discrete cosine transform (DCT) or a modified discrete cosine transform (MDCT) can be used. In the example shown in FIG. 5, each circle represents one coefficient value, and here, a case is shown where a two-channel audio signal is alternately converted into a frequency domain coefficient for each frame.
[0013]
When the modified discrete cosine transform is used as the transform method, N 2 frequency domain coefficients are obtained by transforming the past 2 × N input audio samples for every N inputs. A window function such as a Hamming window or a Hanning window may be applied immediately before performing the time / frequency conversion processing. As the value of N, any value applicable to the time / frequency conversion algorithm may be applied, but a value between 128 and 2048 is most effective. Further, the value of N may be adaptively switched according to the properties of the input signal. For example, N = 2048 at normal time, N = 512 when the input sound is transient, and N = 128 when the input sound is much more transient. When switching adaptively, the value of N in the next frame is encoded and sent to the decoding device.
Encoding model selection unit 20
A series of coefficients in the frequency domain obtained by the time / frequency conversion unit 10 is sent to the coding model selection unit 20. The encoding model selection unit 20 determines, for each small band configured by grouping a series of frequency domain coefficients for each of a plurality of coefficients as shown in FIG. It is determined whether it is optimal to use the encoding method, and selection information for selecting the encoding method is output. The number c in parentheses of the number n (c) (where n = 0,..., N−1 and c = 0, 1) given to each small band shown in FIG. 5 represents a channel number. This embodiment shows a case where two types, that is, a type using vector quantization and a type using scalar quantization can be selected as the encoding model. Details of each encoding model will be described later.
[0014]
The number of frequency domain coefficients constituting the small band may be a fixed number or may be changed according to the frequency to which the small band belongs. In the latter case, if the number of coefficients is small in a low-frequency region and the number of coefficients is large in a high-frequency region, encoding at the subsequent stage can be performed efficiently.
The selection of the vector quantization type and the scalar quantization type in the encoding model selection unit 20 may be determined using the property of the small band. 3A, 3B, 3C, and 3D show examples of the selection algorithm. For example, as shown in FIG. 3A, M frequency domain coefficients x included in each small band_i, I = 0,..., M-1
[0015]
(Equation 1)

(Step S11), and the flatness S is set to a predetermined threshold value S_th Is compared with (step S12),_th Is smaller than the threshold S_thIf the value is less than 0.5, a scalar quantization type is selected (step S13). If not, a vector quantization type is selected (step S14), and selection information indicating the selected type is output for each small band. (Step S15).
[0016]
Alternatively, as shown in FIG. 3B, M frequency domain coefficients x included in the small band_iIs the size of the two strong and weak x_j, J = 0,..., J−1 and x_k, K = 0,..., K−1 (where j ≠ k, J + K = M) (step S21), and the power P of the coefficient belonging to each system_J= Σ | x_j|²And P_K= Σ | x_k|²Ratio P_J/ P_K Is calculated (step S22), and this power ratio is compared with a constant value C (step S23). For example, when C = 2.0, a scalar quantization type is selected (step S24). If not, the vector quantization type is selected (step S25), and the selected information determined for each small band is output (step S26).
[0017]
Alternatively, of a series of small bands of frequency domain coefficients obtained for each input frame, a small band on the lower order side (lower frequency side) is a scalar quantization type, and a smaller band on the higher order side (higher frequency side). Is based on the assumption that encoding is performed using the vector quantization type, and the point at which the number of bits required for scalar quantization for the lower-order small band reaches a predetermined fixed number is determined at the point of switching to vector quantization. May be used. That is, as shown in FIG. 3C, n = 0 and B = 0 are set as initial values (step S31), scalar quantization is selected for the n-th small band n (c), and a predetermined quantization accuracy is determined. Scalar quantization bit number b to be satisfied_n(Step S32), and the number of allocated bits b_nIs accumulated (step S33), and the accumulated value B of the number of allocated bits becomes a predetermined value B_SIt is determined whether or not B is smaller (step S34). If smaller, n is incremented by 1 and the process returns to step S32 (step S35)._SIf so, vector quantization is selected for the remaining small bands (step S36). In the example of FIG. 3C, instead of sending the selection information, information indicating which frequency or which small band the encoding method changes from may be sent.
[0018]
Further, which coding model to use for each small band may be determined in advance. For example, as shown in FIG. 3D, a scalar quantization type is selected for a small band belonging to a frequency lower than the frequency F = 4 kHz (step S41), and a vector quantization type is selected for a small band belonging to a higher frequency. May be selected in advance (step S42). In this case, select informationDecryptionThere is no need to send it to the gasifier.
Note that a plurality of the above selection methods may be combined. For example, a small band belonging to a frequency lower than 2 kHz is always performed by the scalar quantization type according to step S41 of FIG. 3D, and a small band belonging to 2 kHz to 7 kHz is subjected to the method of FIG. , The vector quantization type may be selected for small bands belonging to frequencies higher than 7 kHz according to step S42 in FIG. 3D. In the case of FIGS. 3A and 3B, selection information that specifies which one of the two encoding models is applied to each small band is generated.
[0019]
The coding model selection information determined in this way is sent to the quantization / coding unit 30 and is coded and sent to the multiplexing unit 40 as a selection information code QSSC. In this embodiment, since there are two types of encoding models that can be selected, encoding can be performed if the selection information is at most one bit for each small band. Of course, there may be more than two types of coding models. Alternatively, the selection information may be reversibly compressed using entropy coding or run-length coding.
Quantization / encoding unit 30
The quantization / encoding unit 30 performs quantization and encoding for each small band using the encoding method determined by the encoding model selection unit 20.
[0020]
FIG. 4 shows details of the quantization / encoding unit 30. The quantization / encoding unit 30 includes a weight calculation unit 31, a coefficient distribution unit 32, a weight distribution unit 33, a vector quantization unit 34, a scalar quantization unit 35, and a lossless compression unit 36. . The weight calculator 31 calculates quantization weights for the input frequency domain coefficients of each small band. As the weight, a method using a linear prediction spectrum, a method of obtaining an average value or a maximum value for each small band and using this as a weight, or a combination thereof can be used.
[0021]
The weight calculated by the weight calculation unit 31 is sent to the weight distribution unit 33. Further, since the weight is also used in the decoding device 10B, the weight is encoded and sent to the decoding device 10B via the multiplexing unit 40 together with another weight code WC. The linear prediction spectrum can be efficiently encoded by converting the linear prediction coefficients into LSP coefficients and encoding the LSP coefficients. The representative value of the small band can be encoded by quantizing the value. As a quantization method, vector quantization or scalar quantization may be used. The quantized quantization index may be encoded by applying lossy compression such as entropy encoding. The code without compression is obtained by expressing the quantization index by a binary number.
[0022]
The coefficient distribution unit 32 receives the frequency domain coefficient for each small band as an input, and based on the coding model selection information sent from the coding model selection unit 20, calculates the coefficient for each small band with the vector quantization unit 34 and the scalar quantization. Distribute to section 35. Since the encoding model information is a binary value determined for each small band, coefficients may be allocated to the corresponding encoding model according to this value. In the example of FIG. 5, small bands 0 (0) to 3 (0) of channel 0 are allocated to scalar quantization according to selection information, and small bands 4 (0) and 5 (0) are allocated to vector quantization. Also, the small bands 0 (1), 1 (1), 2 (1), 4 (1) of channel 1 are allocated to scalar quantization, and the small bands 3 (1), 5 (1) are allocated to vector quantization. This shows a case where the objects are sorted.
[0023]
The weight distribution unit 33 distributes weights in the same manner as the coefficient distribution unit 32. The vector quantization unit 34 performs vector quantization on the coefficients from the coefficient distribution unit 32 using the weights transmitted from the coefficient distribution unit 32 and the weight distribution unit 33. Prior to quantization, input coefficients are combined to form one or more quantization units. The quantization unit may store all of the coefficients and configure only one. Alternatively, in the case of the stereo coding in the example shown in FIG. 5, the quantization unit may be configured for each channel as indicated by the vector quantization unit VQ-CU0. May be constituted by one quantization unit. Also, a large number of quantization units may be configured by dividing the coefficient by a certain number.
[0024]
Although vector quantization is performed for each quantization unit, the quantization units may be vector-quantized collectively or may be divided and vector-quantized. As a method of division, a method of dividing into a plurality of areas, a method of dividing after interleaving input coefficients, or the like can be used. The selection of the optimum vector is performed by multiplying the code vector in the code book by the weight from the weight distribution unit 33 and selecting the one closest to the target coefficient vector. As a form of vector quantization, a form such as conjugate structure vector quantization using the sum of code vectors selected from two codebooks or multi-stage vector quantization may be used in addition to normal vector quantization. Encoding is performed by expressing the code vector index determined in this way in a binary number, and the vector quantization code VQC is sent to the multiplexing unit 40.
[0025]
The scalar quantization unit 35 scalar-quantizes the frequency domain coefficient for each small band from the coefficient distribution unit 32 using the weight from the weight distribution unit 33, and sends the quantization index to the lossless compression unit 36. Prior to quantization, input coefficients are combined to form one or more quantization units. Good results can be obtained by configuring one unit for each small band. The quantization value is multiplied by the weight from the weight distribution unit 32 to determine a binary value closest to the frequency domain coefficient with a desired quantization accuracy. At the time of this quantization, it is necessary to determine the quantization accuracy (for example, the number of bits), which is set for each quantization unit. It is desirable to determine from the quantization error which needs to be guaranteed at least, which is determined based on the power of the frequency domain coefficient and the spectrum shape.
[0026]
It is necessary to encode the quantization precision information determined here in some form and send it to the decoder. The simplest method is to encode the number of quantization bits required to satisfy the quantization accuracy. In addition, when entropy coding is performed by the lossless compression unit 36, quantization accuracy can be given by a Huffman code table for Huffman coding or a symbol appearance frequency table for arithmetic coding. The quantization accuracy information can be sent to the decoder.
[0027]
The weighting in the above-described vector quantization and the weighting in the above-described scalar quantization are essentially the same as the flattening by the flattening unit 12 described with reference to FIG. That is, instead of multiplying the code vector in the vector quantization or the quantization value in the scalar quantization shown in FIG. 4 by a weight, the frequency domain coefficient is divided by the weight (that is, flattened), and the result of the division is calculated by the vector quantization. Even if the coding or scalar quantization is performed, the coding result is the same except for the processing procedure.
[0028]
The lossless compression unit 36 performs lossless compression encoding on the quantization index obtained by the scalar quantization unit 35, and performs compression scalar quantization.SignThe CSQC is supplied to the multiplexer 40. Prior to lossless compression encoding, one or more quantization units are put together to form an encoding unit. In the example of FIG. 5, in the processing of channel 0, two quantization units SQU0 and SQU1 are combined into one encoding unit SQCU0, and two quantization units SQU2 and SQU3 are combined into one encoding unit SQCU1. The same processing is performed for No. 1.
[0029]
The encoding unit may be configured by grouping a fixed number of quantization units, or may be configured by integrating quantization units having similar quantization accuracy. In the latter case, it is necessary to encode the configuration information of the encoding unit and send it to the decoder. As a method of lossless compression, in addition to entropy coding such as Huffman coding or arithmetic coding, when length of quantization value 0 is long, run-length coding is effective. When performing entropy coding, it is necessary to provide a Huffman code table for Huffman coding, and a table of symbol appearance frequencies for arithmetic coding. This table is provided for each coding unit.
[0030]
The multiplexing unit 40 multiplexes the coding model selection information code QMSC, the weight code WC, the vector quantization code VQC, and the compressed scalar quantization code CSQC, and outputs the multiplexed code bit string, for example, writes it to a storage medium, or To the device.
Next, this inventionEncoding methodByDecode the encoded codeThe decoding device 10B will be described.
As shown in FIG. 2, the input multiplexed code bit sequence is demultiplexed in the demultiplexer 50 to obtain selection information QMSC, weight code WC, vector quantization code VQC, and compressed scalar quantization code CSQC. The selection information QMSC is reproduced as selection information in the coding model selection reproduction unit 60 and is provided to the coefficient reproduction unit 70.
Encoding model selection / reproduction unit 60
The coding model selection / reproduction unit 60 receives the bit string of the coding model selection code QMSC and reproduces the coding model selection information. If the encoding model selection information is losslessly compressed, decoding is performed for the compression method, and binary information for encoding model selection is obtained. If the lossless compression has not been applied, the bit string is converted into a binary integer and used as coding model selection information. The coding model selection information obtained in this way is sent to the coefficient reproducing unit 70.
Coefficient reproducing unit 70
In the coefficient reproducing unit 70, a coefficient code (a vector quantization code VQC, a compressed scalar quantization code CSQC, and a weight) outputted from the demultiplexing unit 50 as a coding output of the quantization / coding unit 30 in the coding device 10A. Code WC) and the selection information, and perform decoding corresponding to the encoding method specified by the selection information to reproduce the frequency domain coefficient, and provide the frequency / domain coefficient to the frequency / time conversion unit 80.
[0031]
FIG. 6 shows details of the coefficient reproducing unit 70. The coefficient reproduction unit 70 includes a lossless compression decoding unit 71, a scalar quantization reproduction unit 72, a frequency domain reconstruction unit 73, a vector quantization reproduction unit 74, a weight reproduction unit 75, and a weighting unit 76. ing.
Of the code bit strings separated by the demultiplexing unit 50, the bit string of the vector quantization code VQC is restored to a quantization index by expressing a binary number as an integer in the vector quantization reproduction unit 74, and the codebook is referred to. Then, a corresponding vector is read out, and the vector is reproduced as an unweighted small band coefficient. When the vector quantization unit 34 of the encoding device 10A is configured by a plurality of vector quantizations, the vector reproduced using the same rule as that of the vector quantization unit 34 of the encoding device 10A is reproduced by vector inverse quantization. Construct and get unweighted small band coefficients. The reproduced unweighted small band coefficient is sent to the frequency domain reconstruction unit 73.
[0032]
The bit string of the compressed scalar quantization code CSQC from the demultiplexing unit 50 is converted into a scalar quantization index in the lossless compression decoding unit 71 by performing decoding corresponding to the method of lossless compression coding on the encoding device side. The obtained data is sent to the scalar quantization reproduction unit 72.
The scalar quantization reproduction unit 72 reproduces the unweighted small band coefficient by performing scalar inverse quantization on the scalar quantization index received from the lossless compression decoding unit 71 for each quantization unit and restoring the quantization value. The reproduced unweighted small band coefficient is sent to the frequency domain reconstruction unit 73.
[0033]
In the frequency domain reconstruction unit 73, the unweighted small band coefficient for each quantization unit transmitted from the scalar quantization reproduction unit 72 and the vector quantization reproduction unit 73 is encoded by the coding unit received from the encoding model selection information reproduction unit 60. Reconstruct unweighted frequency domain coefficients according to model selection information.
The weight reproducing unit 75 receives the bit string of the weight code WC from the demultiplexing unit 50 and reproduces the weight. The weighting unit 76 obtains a frequency domain coefficient by multiplying the weighted frequency domain coefficient constructed by the frequency domain reconstruction unit 73 by the weight obtained by the weight reproduction unit 75.
Frequency / time converter 80
The frequency / time conversion unit 80 performs frequency / time conversion on the frequency domain coefficient from the integer reproduction unit 70 and outputs an audio signal. As a method of frequency / time conversion, an inverse discrete cosine transform (IDCT) or an inverse modified discrete cosine transform (IMDCT) can be used. When the inverse transformed discrete cosine transform is used as the transform method, N input coefficients are transformed to obtain 2N time-domain samples. After multiplying this sample by the window function, the first half N samples of the current frame and the second half N samples of the immediately preceding frame are added to each other to output N samples.
[0034]
As the value of N, any value applicable to the time / frequency conversion algorithm may be applied, but a value between 128 and 2048 is most effective. Also, when the encoder adaptively switches the value of N according to the characteristics of the input signal, for example, N = 2048 in a normal state, and N = 512 when the input sound is transient, which is much more transient. If N = 128, the value of N is determined according to the information of N passed from the encoding device.
[0035]
FIG. 7 shows an encoding method according to the present invention andDecodes a code coded by this coding methodThe configuration when a decoding method is performed by a computer is shown. The computer 100 includes a CPU 110, a RAM 120, a ROM 130, an input / output interface 140, and a hard disk 150 connected to each other via a bus 180. The ROM 130 stores a basic program for operating the computer 100, and the hard disk 150 stores the encoding method and the encoding method according to the present invention.Decodes a code coded by this coding methodA program for executing the decoding method is stored in advance.
[0036]
For example, at the time of encoding, the CPU 110 loads an encoding program from the hard disk 150 into the RAM 120, encodes audio signal samples input from the interface 140 by processing according to the encoding program, and outputs the audio signal samples from the interface 140. At the time of decoding, the decoding program is loaded from the hard disk 150 to the RAM 120, and the input code is processed according to the decoding program to output audio signal samples.
[0037]
Encoding method according to the present invention andDecodes a code coded by this coding methodAs a program for executing the decoding method, a program recorded on an external disk device 170 connected to the internal bus 180 via the driving device 160 may be used. Alternatively, the program may be downloaded from an external network via the interface 140 and stored in the hard disk 150. Code according to the inventionTransformationThe storage medium on which the program for executing the method is recorded may be a storage medium in the form of a magnetic recording medium, an IC memory, a compact disk, or the like.
[0038]
【The invention's effect】
When the present invention is utilized, in encoding an audio signal at a low bit rate, it is possible to perform highly efficient encoding adapted to the characteristics of an input sound.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a general form of a transform encoding method using vector quantization.
FIG. 2 is a block diagram showing the configuration of an embodiment of the present invention.
3A is a flowchart illustrating an example of an encoding model selection algorithm, FIG. 3B is a flowchart illustrating another selection algorithm, FIG. 3C is a flowchart illustrating another selection algorithm, and FIG. Flow diagram.
FIG. 4 is a block diagram showing a detailed configuration of a quantization / encoding unit in the embodiment of the present invention.
FIG. 5 is a diagram showing a configuration example of an input frequency domain coefficient, a small band, a coding model selection, a quantization unit, and a coding unit.
FIG. 6 is a block diagram showing a detailed configuration of a coefficient reproducing unit in the embodiment of the present invention.
FIG. 7 shows the encoding of the present invention.Method and decoding code encoded by this encoding methodFIG. 18 is a block diagram showing a configuration of a computer for executing a decoding method by a program.

Claims

An encoding method for inputting a discrete sample sequence of an audio signal and outputting a digital code,
(a) performing time / frequency conversion on the discrete sample sequence for each of a fixed number of input samples to obtain frequency domain coefficients;
(b) dividing the frequency domain coefficients into small bands grouped for each of a plurality of coefficients;
(c) For each of the above small bands , select a coding method using a predetermined scalar quantization when the flatness of the shape of the frequency domain coefficient constituting the small band is smaller than a predetermined value, otherwise, Generating selection information for selecting an encoding method using predetermined vector quantization, encoding the selection information, and outputting the selected information as a selection information code ;
(d) encoding each of the small bands according to the selection information to generate a coefficient code, and outputting the code;
And an audio signal encoding method.

Audio signal encoding computer-readable storage medium having a program that Ru is carried out each step in a computer is recorded in the method according to claim 1.