JP3454394B2

JP3454394B2 - Quasi-lossless audio encoding device

Info

Publication number: JP3454394B2
Application number: JP18349495A
Authority: JP
Inventors: 徳彦渕上; 昭治植野
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 1995-06-27
Filing date: 1995-06-27
Publication date: 2003-10-06
Anticipated expiration: 2018-10-06
Also published as: JPH0916199A

Description

【発明の詳細な説明】【０００１】【産業上の利用分野】本発明は、オーディオ信号を所定
の区間毎に高能率符号化する音声の準可逆符号化装置に
関する。【０００２】【従来の技術】ＣＤ（コンパクトディスク）は１９８２
年に登場して十数年が経過し、現在では様々な展開によ
りディジタルストレージメディアとして定着している。
オーディオメディアの用途を考えると、サンプリング周
波数ｆs ＝４４．１ｋＨｚ、量子化ビット数＝１６ビッ
トのこのメディアは完全に成熟期に入っている。また、
この数年のスタジオ製作サイドでは、量子化ビット数の
２０ビット化、２４ビット化やｆs ＝８８．２ｋＨｚ
化、９６ｋＨｚ化等のハイサンプリング化が進んでお
り、より高音質のマスタを基にしてＣＤを作成する動き
が出てきている。【０００３】その理由としては、編集段階では余裕のあ
るフォーマットで作業を行うことによりＣＤの出来上が
りの音質を向上させることができるからであり、また、
そもそもｆs ＝４４．１ｋＨｚ、１６ビットの情報量で
は満足することができないという傾向が出てきた等が考
えられる。この要求を受けて、民生用機器においても再
生時に疑似的に１６ビット→２０ビット変換を行った
り、疑似的に超高域信号を付加する方法等で高音質化を
実現することが行われている。【０００４】一方、ビデオＣＤ、ＭＤ（ミニディス
ク）、ＤＣＣ（デジタル・コンパクト・カセット）のよ
うに、音声信号を高能率符号化することにより伝送効率
を向上させるシステムも実現されており、例えばＤＣＣ
では聴覚心理モデルを利用してＰＣＭの１／４の符号量
で聴感上ほとんどＣＤと遜色のない音質を実現してい
る。【０００５】ここで、高音質の次世代オーディオメディ
アを考えると、必要な情報量としては、２０ビット、ｆ
s ＝８８．２ｋＨｚ（以下「２０８８」という。）等が
考えられる。この場合の伝送レートはＣＤの２．５倍と
なり、符号化方式として単純なＰＣＭを用いることは不
経済であると言える。なお、ＰＣＭに代わる符号化方式
として最近各方面で応用が進んでいる高能率符号化方式
は、・可逆符号化（ロスレス符号化、可逆圧縮、無雑音圧縮
等）・非可逆符号化（非可逆圧縮、有雑音圧縮等）の２つの方式に大きく分類される。前者はテキストデー
タのアーカイバ等で主に実用化されており、後者はＭＰ
ＥＧ国際標準が代表例である。【０００６】「２０８８」のオーディオデータをＰＣＭ
に代わって高能率に伝送するためには、できれば可逆符
号化方式が望ましい。しかしながら、この場合の圧縮率
は曲によって大きく異なり、特にノイズライクな曲ほど
圧縮率を上げることができないことは情報理論からも明
らかである。可逆符号化方式による「２０８８」の圧縮
率は１曲平均でおよそ２５％〜５０％（データ量７５％
〜５０％に圧縮）になると予想され、設計時には最悪値
を想定するので、２５％程度の効率アップとなる。この
場合、曲平均を想定しているので瞬間的には殆ど圧縮す
ることができないフレームも当然存在し、したがって、
可変伝送レート方式を採る必要がある。この結果、オー
サリングも煩雑であり、時間を要する。図５は可逆符号
化した場合の各フレームビット数、平均ビット数及び原
音ビット数の関係の例を示す。【０００７】一方、非可逆符号化方式の場合には、聴覚
心理モデルを最適に利用すれば１／４に圧縮しても何ら
聴覚上の劣化は感じられず、「２０８８」の圧縮率は７
５％（データ量２５％に圧縮）が十分可能である。但
し、実際の高音質次世代メディアではおそらくは記録密
度が向上することと、非可逆符号化で編集を繰り返した
ときの劣化を考えると、これほどの高圧縮率は必要な
い。【０００８】そこで、高音質次世代メディアを考える
と、以下のような選択肢が考えられる。圧縮率が「０」で可能な場合→従来どおりのＰＣＭ圧縮率が２５％以内であって可変伝送レートを用いる
場合→可逆符号化圧縮率が２５％〜５０％程度必要であって可変伝送レ
ートを用いる場合→可逆符号化状態から一部又は全部の
フレームの使用可能符号量（ビット数）を減少させて対
応（一部又は全部を非可逆符号化）圧縮率が５０％以上必要な場合→聴覚心理モデルを利
用した非可逆符号化【０００９】ここで、可逆符号化を行う場合には、大き
く分けて次の２つの方法が考えられる。・時間領域で線形予測を行い、残差を量子化・符号化す
る方法。・信号を周波数領域に変換し、エネルギの偏りを利用し
て正規化し、量子化・符号化する方法。なお、前者の方法は線形予測の効果に限界があるので、
符号化にエントロピー符号化を用いて効率を高めるのが
一般的であり、そのエンコーダを図６に示す。また、後
者の場合には直交変換と正規化がかなりのデータ削減効
果があり、符号化は補助的な役割で用いられ、そのエン
コーダを図７に示す。なお、いずれの場合にも、達成さ
れる圧縮率に大きな差はない。【００１０】図６に示す時間領域処理のエンコーダで
は、線形予測残差出力部１がＰＣＭ信号を時間領域で線
形予測を行い、その残差を出力する。最も効果的な方法
は、各フレームにおいて残差が最小となる線形予測計数
を最小二乗法等で算出する方法である。線形予測の例と
しては直線予測の場合、以下の式で残差を出力する。ｄ〔ｉ〕＝ｘ〔ｉ〕−（２＊ｘ〔ｉ−１〕−ｘ〔ｉ−
２〕）但し、ｘ〔〕は入力信号系列、ｄ〔〕は予測残差系列【００１１】量子化・符号化部２はこの線形予測残差ｄ
〔〕を予め定めたフレーム毎に正規化し、可逆に必要な
精度で量子化する。この場合、量子化値にはエントロピ
ー符号化（例えばハフマン符号、Lempel-Ziv符号等）を
施して更に符号量を削減するのが一般的である。また、
量子化・符号化する場合、符号量制御部３の指示により
量子化ビット数をほぼ一様に増加又は減少させてそのフ
レームにおいて使用可能な符号量に合わせる必要があ
り、また、符号量が余剰な場合にはパディングビットを
付加して調整することもできる。フォーマット出力部４
は一般に、線形予測残差出力部１の予測方式（予測計
数）と、量子化・符号化部２の正規化係数（場合によっ
ては量子化ビット数）と、符号量制御部３の符号量制御
情報と、それにヘッダ等の補助情報を付加してフォーマ
ット化（ビットストリーム化）して伝送する。【００１２】図７に示す周波数領域処理のエンコーダで
は、バッファ１１は後段の窓掛け・直交変換部１２が直
交変換する際に必要なフレーム分のＰＣＭ信号をバッフ
ァリングする。そして、窓掛け・直交変換部１２はこの
フレームデータに窓掛け（一般にはハニング窓等の窓掛
け）し、ＭＤＣＴ（変形離散コサイン変換）等により直
交変換し、この直交変換係数を複数のバンドに分割す
る。正規化部１３はこのバンド毎の正規化係数（スケー
ルファクタ）を決定し、バンド内の直交変換係数を正規
化する。【００１３】量子化・符号化部１４はこの正規化後の係
数を可逆に必要な精度で量子化し、この場合にも必要で
あればエントロピー符号化する。但し、図６に示す時間
領域処理の場合よりエントロピー符号化の効果は一般に
少ない。また、量子化・符号化する場合、符号量制御部
１５の指示により量子化ビット数をほぼ一様に増加又は
減少させてそのフレームにおいて使用可能な符号量に合
わせる必要があり、また、符号量が余剰な場合にはパデ
ィングビットを付加して調整することもできる。フォー
マット出力部１６は一般に、正規化係数（場合によって
は量子化ビット数）と、符号量制御部３の符号量制御情
報と、それにヘッダ等の補助情報を付加してフォーマッ
ト化（ビットストリーム化）して伝送する。【００１４】図８は図７に示すエンコーダにおける周波
数領域の正規化・量子化の処理を示している。この場
合、各バンドの最大値（を１〜２ｄＢ刻みで量子化した
値）を正規化係数＜Ｓ＞とし、可逆に必要なビット数に
ついては、想定されるＰＣＭ原信号の量子化ノイズレベ
ル（ホワイトノイズであるのでレベルはフラットなは
ず）と同等以下にノイズレベル＜Ｎ＞を設定し、各バン
ドのＳ／Ｎを満足するビット数で再量子化する。【００１５】この時の必要情報量は、図中の矩形領域で
仕切られた面積であり、エネルギに偏りがある信号で
は、原音情報量よりかなり削減できることが分かる。な
お、時間領域処理において線形予測残差をとることは、
信号をフィルタリングして残差スペクトルを平均化する
処理に相当し、図８に示す周波数領域の処理をラフに実
現することと等しい。【００１６】【発明が解決しようとする課題】しかしながら、上記４
つの選択肢の中で、及びは特に問題となること
はないが、については伝送レートを調整するために量
子化ビット数を削減する場合に、図５に示すように（ａ）曲全体で一様に過剰ビット分を負担して削減す
る。（ｂ）特に過剰ビットとなったフレームを優先してフレ
ーム内で一様にビットを削減する。という考え方がある。【００１７】ここで、可逆符号化におけるデータ量を考
えると、信号がノイズに近くてエントロピーが大きい場
合にはデータ量が多く（圧縮率が低く）、信号がトーン
ライクであってエントロピーが小さい場合にはデータ量
が少ない（圧縮率が高い）。逆に、聴覚心理モデルから
考えると、信号がノイズライクなほど聴感エントロピー
は小さく、情報量は小さい（圧縮率は高くてもよい）と
言える。すなわち、数学エントロピーと聴感エントロピ
ーは反比例の関係にある。【００１８】したがって、上記（ａ）の場合には、
（ａ）聴感エントロピーが低いフレームの影響が、聴感
エントロピーが高いフレームに影響を及ぼして本来必要
なデータ量を削減してしまうという問題点があり、
（ｂ）の場合にも（ｂ）ビット削減が聴覚心理モデルに
のっとっていないので、聴感上最適な手法とは言えない
という問題点がある。また、何れの場合にも可変伝送レ
ート方式であるので、（ｃ）オーサリング時に曲全体で
ビット不足となったときの処理に時間を要し（最悪２パ
スでオーサリングする必要があり）、また、再生時には
確実なヘッド管理と、ある程度の読み込みバッファが必
要になる、のように符号量制御に関わる処理が煩雑にな
るという問題点が考えられる。【００１９】本発明は上記の問題点に鑑み、本来必要な
データ量を削減することを防止し、聴感上の音質が悪化
することを防止し、符号量制御の煩雑さを防止すること
ができる音声の準可逆符号化装置を提供することを目的
とする。【００２０】【課題を解決するための手段】本発明は上記目的を達成
するために、可逆方式で量子化するのに必要な符号量が
使用可能符号量より不足する場合に聴覚心理モデルに基
づいて非可逆方式で再量子化するようにしている。すな
わち、本発明によれば、オーディオ信号を所定の区間長
ごとにフレーム化する手段と、前記フレーム内の信号を
直交変換して得られた係数を複数のバンドに分割して正
規化する手段と、前記フレーム内の正規化された前記係
数を可逆方式で符号化するのに必要な符号量を算出し、
フレームで使用可能な所定の符号量と比較する符号量制
御手段と、前記フレーム内の正規化された前記係数を聴
覚心理モデルで分析する聴覚心理分析手段と、可逆方式
で符号化するのに必要な符号量が使用可能な所定の符号
量以下の場合にはフレーム内の正規化された前記係数を
可逆方式で量子化し、可逆方式で符号化するのに必要な
符号量が使用可能な所定の符号量を超える場合には前記
フレーム内の正規化された前記係数を前記聴覚心理分析
手段の出力に基づいて非可逆方式で量子化する量子化手
段と、前記量子化手段の出力に、復号化に必要な補助情
報とを付加して固定ビットレートのビットストリームに
フォーマット化する手段とを、有する音声の準可逆符号
化装置が提供される。【００２１】【作用】本発明では、可逆方式で量子化するのに必要な
符号量が使用可能符号量より不足しない場合にはそのま
ま可逆方式で量子化するので、本来必要なデータ量を削
減することを防止することができる。また、不足する場
合に聴覚心理モデルに基づいて非可逆方式で再量子化す
るので、聴感上の音質が悪化することを防止することが
でき、更に、固定伝送レートで伝送することができるの
で符号量制御の煩雑さを防止することができる。【００２２】【実施例】以下、図面を参照して本発明の実施例を説明
する。図１は本発明に係る音声の準可逆符号化装置の一
実施例を示すブロック図、図２は図１における聴覚心理
分析と符号量調整処理を説明するためのフローチャー
ト、図３は図１の準可逆符号化装置と従来例における符
号量不足時の再量子化ノイズレベルの比較例を示す説明
図、図４は図１の準可逆符号化装置と従来例における聴
感上の音質比較例を示す説明図である。【００２３】図１に示す装置では、先ず、図７に示す周
波数領域処理のエンコーダと同様に、バッファ２１が後
段の窓掛け・直交変換部２２が直交変換する際に必要な
フレーム分のＰＣＭ信号をバッファリングし、窓掛け・
直交変換部２２はこのフレームデータに窓掛け（一般に
はハニング窓等の窓掛け）し、ＭＤＣＴ（変形離散コサ
イン変換）等により直交変換し、この直交変換係数を複
数のバンドに分割する。正規化部２３はこのバンド毎の
正規化係数（スケールファクタ）を決定し、バンド内の
直交変換係数を正規化する。量子化・符号化部２４はこ
の正規化後の係数を可逆に必要な精度で量子化し、この
場合にも必要であればエントロピー符号化する。但し、
図６に示す時間領域処理の場合よりエントロピー符号化
の効果は一般に少ない。【００２４】そして、本実施例では、聴覚心理分析部２
５と符号量制御部２６及び量子化・符号化部２４が図２
に示すような処理を行う。図２において、先ず、量子化
・符号化部２４により正規化された係数の１回目の量子
化ビット数（Bit[i]）を決定し、符号量を見積もって総
符号量（Total bit ）を算出する（ステップＳ１）。次
いでそのフレームの使用可能符号量（Avail bit ）を確
認又は算出し（ステップＳ２）、次いで総符号量（Tota
l bit ）と使用可能符号量（Avail bit ）を比較するこ
とにより符号量が不足するか否かをチェックする（ステ
ップＳ３）。【００２５】そして、符号量が不足する場合（Total bi
t ＞Avail bit ）には、先ず、聴覚心理モデルのマスキ
ング効果と最小可聴限特性を考慮してバンドパワーｐ
[i] （＝正規化値² ＝scale[i]² ）からマスキングカー
ブｍ[i] を算出する（ステップＳ４）。この場合、マス
キングカーブｍ[i] は基準カーブcurve[i]とバンドパワ
ーｐ[i] を畳み込み演算することにより得られる。【００２６】次いで最小可聴限とマスキングカーブから
各バンドの標準ノイズレベルＮ[i]を算出し（ステップ
Ｓ５）、次いで標準ノイズレベルＮ[i] が高いバンドか
ら１ビットずつビット削減を行うことにより不足符号量
を各バンドに振り分ける。但し、バンドｉにおいて１ビ
ット削減を行う毎にＮ[i]から６．０を減算し、ビット
削減が標準ノイズレベルＮ[i] と相似形になるようにす
る（ステップＳ６）。そして、このように各バンド毎に
最終的に決定された量子化ビット数で、量子化・符号化
部２４で再量子化及び符号化する（ステップＳ７）。【００２７】また、ステップＳ３において符号量が不足
しない場合には、余剰ビットを各バンドに割り当て又は
パディングし（ステップＳ８）、その量子化ビット数
で、量子化・符号化部２４で再量子化及び符号化する
（ステップＳ７）。フォーマット出力部２６は一般に、
正規化係数（場合によっては量子化ビット数）と、符号
量制御部２６の符号量制御情報と、それにヘッダ等の補
助情報を付加してフォーマット化（ビットストリーム
化）して伝送する。【００２８】図３は上記実施例と、図７に示すエンコー
ダにおいて符号量不足時の再量子化ノイズレベルの設定
例を比較した場合を示している。上記実施例によれば、
再量子化ノイズ聴覚心理モデルに応じてシェーピングさ
れており、ノイズ量が同じであっても聴感上ではノイズ
レベルが下がった場合と同等の効果を得ることができ
る。したがって、聴感上の音質劣化を最小限にして準可
逆的に符号化することができる。【００２９】図４は従来例（図７）の非可逆符号化を行
った場合と、上記実施例の場合の音質の比較例を示し、
図４（ａ）はフレームの一部が非可逆となる場合、図４
（ｂ）はフレームの大部分が非可逆となる場合を示す。
図のように非可逆となる区間において太線で示す本発明
の方が細線で示す従来例より音質を改善することがで
き、したがって、符号化全体として安定した音質を得る
ことができる。また、本発明によれば、非可逆符号化を
行った場合の音質を十分確保することができるので、各
フレームの使用可能符号量が一定の「固定伝送レート」
で伝送することができ、したがって、非可逆フレームが
大幅に増加しても音質上の問題は発生しない。この結
果、オーサリングや再生装置側の符号量制御に関わる処
理を大幅に簡略化することができる。【００３０】【発明の効果】以上説明したように本発明によれば、可
逆方式で量子化するのに必要な符号量が使用可能符号量
より不足しない場合にはそのまま可逆方式で量子化する
ので本来必要なデータ量を削減することを防止すること
ができ、また、不足する場合に聴覚心理モデルに基づい
て非可逆方式で再量子化するので聴感上の音質が悪化す
ることを防止することができ、更に、固定伝送レートで
伝送することができるので符号量制御の煩雑さを防止す
ることができる。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a quasi-reversible encoding apparatus for speech which encodes an audio signal with high efficiency at predetermined intervals. 2. Description of the Related Art CDs (Compact Discs) are 1982
More than a dozen years have passed since its appearance in the year, and it has now become established as a digital storage medium through various developments.
Considering the use of audio media, this media having a sampling frequency fs = 44.1 kHz and the number of quantization bits = 16 bits has completely entered the maturity period. Also,
In the studio production side of these years, the number of quantization bits has been reduced to 20 bits and 24 bits, and fs = 88.2 kHz.
High sampling, such as 96 kHz, 96 kHz, etc., is progressing, and there is a movement to create a CD based on a master having higher sound quality. [0003] The reason for this is that the sound quality of the finished CD can be improved by working in a format that has room at the editing stage.
It is considered that fs = 44.1 kHz and a 16-bit information amount tend to be unsatisfactory in the first place. In response to this request, consumer devices have been able to achieve high sound quality by performing a pseudo 16-bit to 20-bit conversion at the time of reproduction, or by adding a super-high-frequency signal in a pseudo manner. I have. On the other hand, systems such as video CDs, MDs (mini-discs), and DCCs (digital compact cassettes) have been realized which improve transmission efficiency by encoding audio signals with high efficiency.
Utilizes the psychoacoustic model and realizes sound quality almost equal to CD in terms of hearing with a code amount of 1/4 of PCM. [0005] Considering next-generation audio media of high sound quality, the required information amount is 20 bits, f
s = 88.2 kHz (hereinafter referred to as “2088”) and the like can be considered. In this case, the transmission rate is 2.5 times that of the CD, and it can be said that it is uneconomical to use a simple PCM as an encoding method. The high-efficiency coding methods that have recently been applied in various fields as an alternative to PCM include: lossless coding (lossless coding, lossless compression, noiseless compression, etc.) lossy coding (lossy coding) Compression, noise compression, etc.). The former is mainly used for text data archivers, etc.
The EG international standard is a typical example. The audio data of "2088" is converted to PCM
In order to transmit data with high efficiency instead of the above, a lossless coding method is desirable if possible. However, the compression ratio in this case greatly differs depending on the music, and it is clear from the information theory that the compression ratio cannot be increased particularly for a noise-like music. The compression ratio of “2088” by the lossless encoding method is about 25% to 50% (the data amount is 75%
(Compression to ５０50%), and the worst value is assumed at the time of design, so the efficiency is increased by about 25%. In this case, there is naturally a frame that cannot be almost instantaneously compressed because the song average is assumed.
It is necessary to adopt a variable transmission rate method. As a result, authoring is complicated and time-consuming. FIG. 5 shows an example of the relationship among each frame bit number, average bit number, and original sound bit number when lossless encoding is performed. On the other hand, in the case of the irreversible coding method, if the psychoacoustic model is optimally used, there is no perceived deterioration even if the compression is made to 1/4, and the compression rate of "2088" is 7
5% (compression to 25% data amount) is sufficiently possible. However, considering that the recording density is likely to be improved in an actual high-quality next-generation medium and that the loss is caused when editing is repeated by irreversible encoding, such a high compression ratio is not necessary. [0008] Considering the next generation media of high sound quality, the following options are conceivable. When the compression rate is possible with "0" → PCM as before Conventional when the compression rate is within 25% and a variable transmission rate is used → A lossless encoding compression rate of about 25% to 50% is required and the variable transmission rate Is used → The usable code amount (the number of bits) of some or all frames is reduced from the lossless encoding state to cope with them (partial or all is irreversible encoding). When the compression rate is required to be 50% or more → Irreversible coding using psychoacoustic model. When performing lossless encoding, the following two methods can be roughly classified. A method of performing linear prediction in the time domain and quantizing and encoding the residual. A method of transforming a signal into the frequency domain, normalizing using the bias of energy, and performing quantization and coding. Since the former method has a limited effect of linear prediction,
It is common to use entropy coding for encoding to increase efficiency, and the encoder is shown in FIG. In the latter case, orthogonal transformation and normalization have a considerable data reduction effect, and encoding is used in an auxiliary role, and its encoder is shown in FIG. In each case, there is no significant difference in the achieved compression ratio. In the encoder for the time domain processing shown in FIG. 6, a linear prediction residual output unit 1 performs linear prediction on a PCM signal in the time domain and outputs the residual. The most effective method is a method of calculating a linear prediction count that minimizes the residual in each frame by a least square method or the like. As an example of linear prediction, in the case of linear prediction, a residual is output by the following equation. d [i] = x [i]-(2 * x [i-1] -x [i-
2]) where x [] is an input signal sequence, d [] is a prediction residual sequence.
[] Is normalized for each predetermined frame, and quantized reversibly with required accuracy. In this case, generally, the quantized value is subjected to entropy coding (for example, Huffman code, Lempel-Ziv code, etc.) to further reduce the code amount. Also,
When performing quantization / encoding, it is necessary to increase or decrease the number of quantization bits almost uniformly according to the instruction of the code amount control unit 3 to match the code amount usable in the frame. In such cases, padding bits can be added for adjustment. Format output unit 4
Generally, the prediction method (prediction count) of the linear prediction residual output unit 1, the normalization coefficient (quantization bit number in some cases) of the quantization / encoding unit 2, and the code amount control of the code amount control unit 3 Information and auxiliary information such as a header are added to the information and the data is formatted (bit stream) and transmitted. In the encoder of the frequency domain processing shown in FIG. 7, the buffer 11 buffers PCM signals for frames required when the windowing / orthogonal transformer 12 at the subsequent stage performs orthogonal transform. Then, the windowing / orthogonal transforming unit 12 applies windowing (generally, a windowing such as a Hanning window) to the frame data, performs orthogonal transform using MDCT (Modified Discrete Cosine Transform) or the like, and converts the orthogonal transform coefficient into a plurality of bands. To divide. The normalization unit 13 determines a normalization coefficient (scale factor) for each band, and normalizes the orthogonal transform coefficients in the band. The quantizing / encoding unit 14 reversibly quantizes the normalized coefficient with a required accuracy, and in this case also performs entropy encoding if necessary. However, the effect of entropy coding is generally smaller than in the case of the time domain processing shown in FIG. Further, when performing quantization / encoding, it is necessary to increase or decrease the number of quantization bits almost uniformly according to the instruction of the code amount control unit 15 to match the code amount usable in the frame. If is excessive, it can be adjusted by adding padding bits. The format output unit 16 generally adds a normalization coefficient (in some cases, the number of quantization bits), code amount control information of the code amount control unit 3, and auxiliary information such as a header to perform formatting (bit stream formation). And transmit. FIG. 8 shows a process of normalization and quantization in the frequency domain in the encoder shown in FIG. In this case, the maximum value of each band (a value obtained by quantizing each band in the range of 1 to 2 dB) is defined as a normalization coefficient <S>, and the number of bits required reversibly is determined by the quantization noise level of the assumed PCM original signal ( The noise level <N> is set to be equal to or less than that of white noise (the level should be flat), and requantization is performed with the number of bits satisfying the S / N of each band. The necessary information amount at this time is an area divided by a rectangular area in the figure, and it can be seen that a signal having a bias in energy can be considerably reduced from the original sound information amount. Note that taking the linear prediction residual in the time domain processing is
This corresponds to a process of filtering the signal and averaging the residual spectrum, and is equivalent to roughly realizing the process in the frequency domain shown in FIG. [0016] However, the above 4)
Of the two options, and, which do not cause any particular problem, when (a) when the number of quantization bits is reduced in order to adjust the transmission rate, as shown in FIG. To reduce the excess bits. (B) In particular, priority is given to a frame having excessive bits, and bits are uniformly reduced in the frame. There is an idea. Here, considering the amount of data in lossless encoding, when the signal is close to noise and the entropy is large, the data amount is large (compression rate is low), and when the signal is tone-like and the entropy is small. Has a small amount of data (high compression rate). Conversely, from the psychoacoustic model, it can be said that the noise-like signal has a smaller auditory entropy and a smaller amount of information (the compression ratio may be higher). That is, mathematical entropy and auditory entropy are in inverse proportion. Therefore, in the case of the above (a),
(A) There is a problem that the influence of a frame having a low auditory entropy affects a frame having a high auditory entropy, thereby reducing the amount of data originally required.
Also in the case of (b), since (b) bit reduction does not follow the psychoacoustic model, there is a problem that it is not an optimal method for hearing. In each case, since the variable transmission rate method is used, (c) it takes time to perform processing when the number of bits becomes insufficient in the entire music piece during authoring (authoring must be performed in the worst two passes). There is a problem that processing related to code amount control becomes complicated, for example, a certain head management and a certain read buffer are required at the time of reproduction. In view of the above problems, the present invention can prevent reduction of the originally required data amount, prevent deterioration of sound quality in audibility, and prevent complexity of code amount control. It is an object of the present invention to provide a quasi-lossless audio encoding device. In order to achieve the above object, the present invention is based on a psychoacoustic model when the code amount required for quantization in a reversible manner is less than the available code amount. To perform requantization in an irreversible manner. That is, according to the present invention, means for framing an audio signal for each predetermined section length ,
The coefficient obtained by orthogonal transformation is divided into multiple bands and
Means for normalizing, and the normalized
Calculate the amount of code needed to encode the number in a lossless manner,
A code amount control means for comparing the predetermined code amount of available frames, and psychoacoustic analysis means for analyzing the normalized said coefficients in said frame psychoacoustic model, lossless scheme
If the code amount required for encoding is equal to or less than a predetermined code amount that can be used, the normalized coefficient in the frame is quantized in a lossless manner, and necessary to encode in a lossless manner. quantizing irreversible manner on the basis of a normalized the coefficients in the <br/> frame to the output of the psychoacoustic analysis means when br /> code amount exceeds a predetermined code amount of available Quantizing means and auxiliary information necessary for decoding are output to the output of the quantizing means.
Information to a fixed bit rate bit stream.
Quasi-lossless audio encoding apparatus comprising: means for formatting . According to the present invention, if the code amount necessary for quantization in the lossless method is not less than the usable code amount, quantization is performed in the lossless method as it is, so that the originally required data amount is reduced. Can be prevented. In addition, since the quantization is re-quantized in an irreversible manner based on the psychoacoustic model when insufficient, it is possible to prevent the sound quality from being deteriorated in auditory sense. The complexity of the quantity control can be prevented. Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of a quasi-lossless audio coding apparatus according to the present invention, FIG. 2 is a flowchart for explaining psychoacoustic analysis and code amount adjustment processing in FIG. 1, and FIG. FIG. 4 is an explanatory diagram showing a comparative example of the requantization noise level when the code amount is insufficient between the quasi-reversible encoding device and the conventional example, and FIG. 4 shows a comparative example of audible sound quality between the quasi-reversible encoding device of FIG. 1 and the conventional example. FIG. In the apparatus shown in FIG. 1, first, similarly to the encoder in the frequency domain processing shown in FIG. 7, the buffer 21 is a PCM signal for a frame necessary when the windowing / orthogonal transformation unit 22 in the subsequent stage performs orthogonal transformation. Buffering and windowing
The orthogonal transform unit 22 applies a window to the frame data (generally, a window such as a Hanning window), performs an orthogonal transform using an MDCT (Modified Discrete Cosine Transform) or the like, and divides the orthogonal transform coefficient into a plurality of bands. The normalization unit 23 determines a normalization coefficient (scale factor) for each band, and normalizes the orthogonal transform coefficients in the band. The quantizing / encoding unit 24 reversibly quantizes the normalized coefficient with necessary accuracy, and in this case also performs entropy encoding if necessary. However,
The effect of entropy coding is generally less than in the case of the time domain processing shown in FIG. In this embodiment, the psychoacoustic analyzer 2
5 and the code amount control unit 26 and the quantization / encoding unit 24
The following processing is performed. In FIG. 2, first, the first quantization bit number (Bit [i]) of the coefficient normalized by the quantization / encoding unit 24 is determined, the code amount is estimated, and the total code amount (Total bit) is calculated. It is calculated (step S1). Next, the available code amount (Avail bit) of the frame is confirmed or calculated (step S2), and then the total code amount (Tota) is calculated.
l bit) is compared with the available code amount (Avail bit) to check whether the code amount is insufficient (step S3). If the code amount is insufficient (Total bi
t> Avail bit) includes band power p in consideration of the masking effect of the psychoacoustic model and the minimum audible characteristic.
A masking curve m [i] is calculated from [i] (= normalized value ² = scale [i] ² ) (step S4). In this case, the masking curve m [i] is obtained by performing a convolution operation on the reference curve curve [i] and the band power p [i]. Next, the standard noise level N [i] of each band is calculated from the minimum audibility and the masking curve (step S5), and then the bit is reduced one bit at a time from the band having the higher standard noise level N [i]. The insufficient code amount is allocated to each band. However, every time one bit is reduced in band i, 6.0 is subtracted from N [i] so that the bit reduction becomes similar to the standard noise level N [i] (step S6). Then, the quantization / encoding unit 24 performs requantization and encoding using the number of quantization bits finally determined for each band (step S7). If the code amount is not insufficient at step S3, the surplus bits are assigned or padded to each band (step S8), and the quantization / encoding unit 24 requantizes the surplus bits with the number of quantization bits. And encoding (step S7). The format output unit 26 generally includes
A normalization coefficient (the number of quantization bits in some cases), the code amount control information of the code amount control unit 26, and auxiliary information such as a header are added thereto to format (bit stream) and transmit. FIG. 3 shows a case where the above embodiment is compared with a setting example of the requantization noise level when the code amount is insufficient in the encoder shown in FIG. According to the above embodiment,
The re-quantization noise is shaped according to the psychoacoustic model, so that even if the amount of noise is the same, it is possible to obtain an effect equivalent to a case where the noise level is reduced in terms of audibility. Therefore, it is possible to perform quasi-reversible encoding while minimizing the deterioration of sound quality in the sense of hearing. FIG. 4 shows a comparison example of the sound quality between the case where the lossy encoding of the conventional example (FIG. 7) is performed and the case of the above embodiment.
FIG. 4A shows a case where a part of a frame is irreversible.
(B) shows a case where most of the frame is irreversible.
In the irreversible section as shown in the figure, the present invention shown by a thick line can improve the sound quality more than the conventional example shown by a thin line, and therefore, can obtain a stable sound quality as a whole encoding. Further, according to the present invention, sound quality when irreversible coding is performed can be sufficiently ensured, so that the available code amount of each frame is a fixed “fixed transmission rate”.
Therefore, even if the number of irreversible frames is greatly increased, no sound quality problem occurs. As a result, the processing related to the authoring and the code amount control on the reproduction apparatus side can be greatly simplified. As described above, according to the present invention, if the code amount required for quantization in the lossless method is not less than the usable code amount, quantization is performed in the lossless method as it is. It is possible to prevent the amount of data originally required from being reduced, and in the case of shortage, requantization is performed in an irreversible manner based on the psychoacoustic model, so that it is possible to prevent the sound quality from being perceived from deteriorating. Since transmission can be performed at a fixed transmission rate, the complexity of code amount control can be prevented.

【図面の簡単な説明】【図１】本発明に係る音声の準可逆符号化装置の一実施
例を示すブロック図である。【図２】図１における聴覚心理分析と符号量調整処理を
説明するためのフローチャートである。【図３】図１の準可逆符号化装置と従来例における符号
量不足時の再量子化ノイズレベルの比較例を示す説明図
である。【図４】図１の準可逆符号化装置と従来例における聴感
上の音質比較例を示す説明図である。【図５】可逆符号化した場合の各フレームビット数、平
均ビット数及び原音ビット数の関係例を示す説明図であ
る。【図６】従来の時間領域処理の可逆符号化方式エンコー
ダを示すブロック図である。【図７】従来の周波数領域処理の可逆符号化方式エンコ
ーダを示すブロック図である。【図８】図７に示すエンコーダにおける周波数領域処理
を示す説明図である。【符号の説明】２２窓掛け・直交変換部２３正規化部２４量子化・符号化部（量子化手段、再量子化手段）２５聴覚心理分析部（再量子化手段）２６符号量制御部（再量子化手段）２７フォーマット化出力部BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing an embodiment of a quasi-lossless audio coding apparatus according to the present invention. FIG. 2 is a flowchart for explaining psychoacoustic analysis and code amount adjustment processing in FIG. 1; FIG. 3 is an explanatory diagram showing a comparative example of a requantization noise level when the code amount is insufficient in the quasi-reversible encoding device of FIG. 1 and a conventional example. FIG. 4 is an explanatory diagram showing a comparative example of audible sound quality between the quasi-reversible encoding device of FIG. 1 and a conventional example. FIG. 5 is an explanatory diagram showing an example of the relationship among the number of frame bits, the average number of bits, and the number of original sound bits when lossless encoding is performed. FIG. 6 is a block diagram illustrating a conventional lossless encoding method encoder for time domain processing. FIG. 7 is a block diagram showing a conventional frequency-domain processing lossless encoding encoder. FIG. 8 is an explanatory diagram showing frequency domain processing in the encoder shown in FIG. 7; [Description of Code] 22 Windowing / Orthogonal Transformation Unit 23 Normalization Unit 24 Quantization / Encoding Unit (Quantization Unit, Requantization Unit) 25 Psychoacoustic Analysis Unit (Requantization Unit) 26 Code Amount Control Unit ( Requantization means) 27 Formatted output unit

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平６−69811（ＪＰ，Ａ) 特開昭63−15559（ＪＰ，Ａ) 特開平２−288739（ＪＰ，Ａ) 特開平７−50589（ＪＰ，Ａ) 国際公開90／010993（ＷＯ，Ａ１) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 19/00 G10L 19/02 ──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-6-69811 (JP, A) JP-A-63-15559 (JP, A) JP-A-2-288739 (JP, A) JP-A-7- 50589 (JP, A) WO 90/010993 (WO, A1) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 19/00 G10L 19/02

Claims

(57) [Claim 1] A means for framing an audio signal for each predetermined section length, and a coefficient obtained by orthogonally transforming a signal in the frame.
Means for dividing into a number of bands and normalizing, and calculating a code amount necessary for coding the normalized coefficients in the frame in a lossless manner, and a predetermined code amount usable in the frame. a code amount control means for comparing, the psychoacoustic analysis means for analyzing the normalized said coefficients in said frame psychoacoustic model, where the available amount of code required for encoding in a reversible manner
If the code amount is less than the specified code amount, the normalized
The coefficients are quantized in a lossless manner and encoded in a lossless manner.
Quantizing means for quantizing an irreversible manner on the basis of a normalized the coefficients in the frame when the code amount needed to exceed the predetermined code amount that can be used for the output of the psychoacoustic analysis means And the auxiliary information necessary for decoding in the output of the quantization means.
To a fixed bit rate bit stream.
Quasi-reversible encoding apparatus for audio having a matting unit.