JP4921611B2

JP4921611B2 - Speech decoding apparatus, speech decoding method, and speech decoding program

Info

Publication number: JP4921611B2
Application number: JP2011271559A
Authority: JP
Inventors: 孝輔辻野; 圭菊入; 信彦仲
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2009-04-03
Filing date: 2011-12-12
Publication date: 2012-04-25
Anticipated expiration: 2030-01-12
Also published as: JP5320475B2; JP2012093794A; JP5588547B2; JP2012053493A; JP2013225152A

Abstract

<P>PROBLEM TO BE SOLVED: To improve the subjective quality of a decoded signal by reducing generated pre-echoes and post-echoes without significantly increasing a bit rate in a band extension technology in a frequency region represented by SBR. <P>SOLUTION: A voice decoding device obtains a linear prediction coefficient by performing linear prediction analysis in a frequency direction to a signal represented by the frequency region based on a covariance method or an autocorrelation method, and deforms a time envelope of the signal by adjusting a filter intensity for the obtained linear prediction coefficient and then by filtering the signal in the frequency direction based on the adjusted coefficient. <P>COPYRIGHT: (C)2012,JPO&INPIT

Description

本発明は、音声符号化装置、音声復号装置、音声符号化方法、音声復号方法、音声符号化プログラム及び音声復号プログラムに関する。 The present invention relates to a speech encoding device, speech decoding device, speech encoding method, speech decoding method, speech encoding program, and speech decoding program.

聴覚心理を利用して人間の知覚に不必要な情報を取り除くことにより信号のデータ量を数十分の一に圧縮する音声音響符号化技術は、信号の伝送・蓄積において極めて重要な技術である。広く利用されている知覚的オーディオ符号化技術の例として、“ISO/IEC MPEG”で標準化された“MPEG4 AAC”などを挙げることができる。 Audio-acoustic coding technology that compresses the amount of signal data to several tenths by removing information unnecessary for human perception using auditory psychology is an extremely important technology for signal transmission and storage. . Examples of widely used perceptual audio coding techniques include “MPEG4 AAC” standardized by “ISO / IEC MPEG”.

音声符号化の性能をさらに向上させ、低いビットレートで高い音声品質を得る方法として、音声の低周波成分を用いて高周波成分を生成する帯域拡張技術が近年広く用いられるようになった。帯域拡張技術の代表的な例は“MPEG4 AAC”で利用されるＳＢＲ（SpectralBand Replication）技術である。ＳＢＲでは、ＱＭＦ（Quadrature MirrorFilter）フィルタバンクによって周波数領域に変換された信号に対し、低周波帯域から高周波帯域へのスペクトル係数の複写を行うことにより高周波成分を生成した後、複写された係数のスペクトル包絡とトーナリティを調整することによって高周波成分の調整を行う。帯域拡張技術を利用した音声符号化方式は、信号の高周波成分を少量の補助情報のみを用いて再生することができるため、音声符号化の低ビットレート化のために有効である。 As a method for further improving the performance of speech coding and obtaining high speech quality at a low bit rate, a band expansion technique for generating a high-frequency component using a low-frequency component of speech has been widely used in recent years. A typical example of the bandwidth extension technology is SBR (SpectralBand Replication) technology used in “MPEG4 AAC”. In SBR, a signal converted into a frequency domain by a QMF (Quadrature MirrorFilter) filter bank is used to generate a high frequency component by copying a spectrum coefficient from a low frequency band to a high frequency band, and then the spectrum of the copied coefficient. The high frequency component is adjusted by adjusting the envelope and tonality. The speech coding method using the band expansion technology can reproduce the high-frequency component of the signal using only a small amount of auxiliary information, and is therefore effective for reducing the bit rate of speech coding.

ＳＢＲに代表される周波数領域での帯域拡張技術は、周波数領域で表現されたスペクトル係数に対してスペクトル包絡とトーナリティの調整を、スペクトル係数に対するゲインの調整、時間方向の線形予測逆フィルタ処理、ノイズの重畳によって行う。この調整処理により、スピーチ信号や拍手、カスタネットのような時間エンベロープの変化の大きい信号を符号化した際には復号信号においてプリエコー又はポストエコーと呼ばれる残響状の雑音が知覚される場合がある。この問題は、調整処理の過程で高周波成分の時間エンベロープが変形し、多くの場合は調整前より平坦な形状になることに起因する。調整処理により平坦になった高周波成分の時間エンベロープは符号前の原信号における高周波成分の時間エンベロープと一致せず、プリエコー・ポストエコーの原因となる。 Band extension technology in the frequency domain typified by SBR is to adjust the spectral envelope and tonality for the spectral coefficients expressed in the frequency domain, adjust the gain for the spectral coefficients, linear prediction inverse filtering in the time direction, noise This is done by superimposing. With this adjustment process, when a signal having a large time envelope change such as a speech signal, applause, or castanets is encoded, reverberant noise called pre-echo or post-echo may be perceived in the decoded signal. This problem is caused by the time envelope of the high-frequency component being deformed during the adjustment process, and in many cases, the shape becomes flatter than before the adjustment. The time envelope of the high frequency component flattened by the adjustment processing does not coincide with the time envelope of the high frequency component in the original signal before the sign, and causes pre-echo and post-echo.

同様のプリエコー・ポストエコーの問題は、“MPEG Surround”およびパラメトリックステレオに代表される、パラメトリック処理を用いたマルチチャネル音響符号化においても発生する。マルチチャネル音響符号化における復号器は復号信号に残響フィルタによる無相関化処理を施す手段を含むが、無相関化処理の過程において信号の時間エンベロープが変形し、プリエコー・ポストエコーと同様の再生信号の劣化が生じる。この課題に対する解決法として、ＴＥＳ（Temporal Envelope Shaping）技術が存在する（特許文献１）。ＴＥＳ技術では、ＱＭＦ領域で表現された無相関化処理前の信号に対し周波数方向に線形予測分析を行い、線形予測係数を得た後、得られた線形予測係数を用いて無相関化処理後の信号に対し周波数方向に線形予測合成フィルタ処理を行う。この処理により、ＴＥＳ技術は無相関化処理前の信号の持つ時間エンベロープを抽出し、それに合わせて無相関化処理後の信号の時間エンベロープを調整する。無相関化処理前の信号は歪の少ない時間エンベロープを持つため、以上の処理により、無相関化処理後の信号の時間エンベロープを歪の少ない形状に調整し、プリエコー・ポストエコーの改善された再生信号を得ることができる。 Similar pre-echo and post-echo problems also occur in multi-channel acoustic coding using parametric processing, represented by “MPEG Surround” and parametric stereo. The decoder in multi-channel acoustic coding includes means for applying a decorrelation process to the decoded signal using a reverberation filter, but the time envelope of the signal is deformed in the process of the decorrelation process, and a reproduced signal similar to a pre-echo / post-echo signal Degradation occurs. As a solution to this problem, TES (Temporal Envelope Shaping) technology exists (Patent Document 1). In the TES technique, linear prediction analysis is performed in the frequency direction on a signal before decorrelation processing expressed in the QMF region, and after obtaining a linear prediction coefficient, the signal after decorrelation processing is performed using the obtained linear prediction coefficient. Is subjected to linear prediction synthesis filter processing in the frequency direction. With this process, the TES technique extracts the time envelope of the signal before the decorrelation process, and adjusts the time envelope of the signal after the decorrelation process accordingly. Since the signal before decorrelation processing has a time envelope with less distortion, the above processing adjusts the time envelope of the signal after decorrelation processing to a shape with less distortion, and improves pre-echo and post-echo reproduction. A signal can be obtained.

米国特許出願公開第２００６／０２３９４７３号明細書US Patent Application Publication No. 2006/0239473

以上に示したＴＥＳ技術は、無相関化処理前の信号が歪の少ない時間エンベロープを持つことを利用したものである。しかし、ＳＢＲ復号器では信号の高周波成分を低周波成分からの信号複写によって複製するため、高周波成分に関する歪の少ない時間エンベロープを得ることができない。この問題に対する解決法の一つとして、ＳＢＲ符号器において入力信号の高周波成分を分析し、分析の結果得られた線形予測係数を量子化し、ビットストリームに多重化して伝送する方法が考えられる。これにより、ＳＢＲ復号器において高周波成分の時間エンベロープに関する歪の少ない情報を含む線形予測係数を得ることができる。しかし、この場合、量子化された線形予測係数の伝送に多くの情報量が必要となり、符号化ビットストリーム全体のビットレートが著しく増大してしまうという問題を伴う。そこで、本発明の目的は、ＳＢＲに代表される周波数領域での帯域拡張技術において、ビットレートを著しく増大させることなく、発生するプリエコー・ポストエコーを軽減し復号信号の主観的品質を向上させることである。 The TES technique described above utilizes the fact that a signal before decorrelation processing has a time envelope with little distortion. However, since the SBR decoder duplicates the high frequency component of the signal by copying the signal from the low frequency component, it is not possible to obtain a time envelope with little distortion related to the high frequency component. One solution to this problem is to analyze the high-frequency component of the input signal in the SBR encoder, quantize the linear prediction coefficient obtained as a result of the analysis, and multiplex it into a bitstream for transmission. As a result, a linear prediction coefficient including information with little distortion regarding the time envelope of the high frequency component can be obtained in the SBR decoder. However, in this case, a large amount of information is required for transmission of the quantized linear prediction coefficient, which causes a problem that the bit rate of the entire encoded bit stream is remarkably increased. Accordingly, an object of the present invention is to reduce the generated pre-echo and post-echo and improve the subjective quality of the decoded signal without significantly increasing the bit rate in the band expansion technology in the frequency domain represented by SBR. It is.

本発明の音声符号化装置は、音声信号を符号化する音声符号化装置であって、前記音声信号の低周波成分を符号化するコア符号化手段と、前記音声信号の低周波成分の時間エンベロープを用いて、前記音声信号の高周波成分の時間エンベロープの近似を得るための時間エンベロープ補助情報を算出する時間エンベロープ補助情報算出手段と、少なくとも、前記コア符号化手段によって符号化された前記低周波成分と、前記時間エンベロープ補助情報算出手段によって算出された前記時間エンベロープ補助情報とが多重化されたビットストリームを生成するビットストリーム多重化手段と、を備える、ことを特徴とする。 The speech coding apparatus of the present invention is a speech coding apparatus that encodes a speech signal, and includes a core coding unit that encodes a low frequency component of the speech signal, and a time envelope of the low frequency component of the speech signal. Using time envelope auxiliary information calculating means for calculating time envelope auxiliary information for obtaining an approximation of the time envelope of the high frequency component of the audio signal, and at least the low frequency component encoded by the core encoding means And bit stream multiplexing means for generating a bit stream in which the time envelope auxiliary information calculated by the time envelope auxiliary information calculating means is multiplexed.

本発明の音声符号化装置では、前記時間エンベロープ補助情報は、所定の解析区間内において前記音声信号の高周波成分における時間エンベロープの変化の急峻さを示すパラメータを表すのが好ましい。 In the speech coding apparatus according to the present invention, it is preferable that the time envelope auxiliary information represents a parameter indicating the steepness of change of the time envelope in the high frequency component of the speech signal within a predetermined analysis interval.

本発明の音声符号化装置では、前記音声信号を周波数領域に変換する周波数変換手段を更に備え、前記時間エンベロープ補助情報算出手段は、前記周波数変換手段によって周波数領域に変換された前記音声信号の高周波側係数に対し周波数方向に線形予測分析を行って取得された高周波線形予測係数に基づいて、前記時間エンベロープ補助情報を算出するのが好ましい。 The speech coding apparatus according to the present invention further comprises frequency conversion means for converting the speech signal into a frequency domain, wherein the time envelope auxiliary information calculation means is a high frequency of the speech signal converted into the frequency domain by the frequency conversion means. It is preferable to calculate the time envelope auxiliary information based on a high-frequency linear prediction coefficient obtained by performing a linear prediction analysis on the side coefficient in the frequency direction.

本発明の音声符号化装置では、前記時間エンベロープ補助情報算出手段は、前記周波数変換手段によって周波数領域に変換された前記音声信号の低周波側係数に対し周波数方向に線形予測分析を行って低周波線形予測係数を取得し、該低周波線形予測係数と前記高周波線形予測係数とに基づいて前記時間エンベロープ補助情報を算出するのが好ましい。 In the speech coding apparatus of the present invention, the time envelope auxiliary information calculating means performs a linear prediction analysis in a frequency direction on a low frequency side coefficient of the speech signal converted into a frequency domain by the frequency converting means, and performs low frequency It is preferable to obtain a linear prediction coefficient and calculate the temporal envelope auxiliary information based on the low frequency linear prediction coefficient and the high frequency linear prediction coefficient.

本発明の音声符号化装置では、前記時間エンベロープ補助情報算出手段は、前記低周波線形予測係数及び前記高周波線形予測係数のそれぞれから予測ゲインを取得し、当該二つの予測ゲインの大小に基づいて前記時間エンベロープ補助情報を算出するのが好ましい。 In the speech encoding device of the present invention, the temporal envelope auxiliary information calculating means acquires a prediction gain from each of the low-frequency linear prediction coefficient and the high-frequency linear prediction coefficient, and based on the magnitude of the two prediction gains, It is preferable to calculate time envelope auxiliary information.

本発明の音声符号化装置では、前記時間エンベロープ補助情報算出手段は、前記音声信号から高周波成分を分離し、時間領域で表現された時間エンベロープ情報を当該高周波成分から取得し、当該時間エンベロープ情報の時間的変化の大きさに基づいて前記時間エンベロープ補助情報を算出するのが好ましい。 In the speech encoding device of the present invention, the time envelope auxiliary information calculating means separates a high frequency component from the speech signal, acquires time envelope information expressed in a time domain from the high frequency component, and It is preferable to calculate the time envelope auxiliary information based on the magnitude of the temporal change.

本発明の音声符号化装置では、前記時間エンベロープ補助情報は、前記音声信号の低周波成分に対し周波数方向への線形予測分析を行って得られる低周波線形予測係数を用いて高周波線形予測係数を取得するための差分情報を含むのが好ましい。 In the speech coding apparatus of the present invention, the time envelope auxiliary information is obtained by using a low-frequency linear prediction coefficient obtained by performing a linear prediction analysis in a frequency direction on a low-frequency component of the speech signal. It is preferable to include difference information for acquisition.

本発明の音声符号化装置では、前記音声信号を周波数領域に変換する周波数変換手段を更に備え、前記時間エンベロープ補助情報算出手段は、前記周波数変換手段によって周波数領域に変換された前記音声信号の低周波成分及び高周波側係数のそれぞれに対し周波数方向に線形予測分析を行って低周波線形予測係数と高周波線形予測係数とを取得し、当該低周波線形予測係数及び高周波線形予測係数の差分を取得することによって前記差分情報を取得するのが好ましい。 The speech coding apparatus according to the present invention further comprises frequency conversion means for converting the speech signal into a frequency domain, wherein the time envelope auxiliary information calculation means is a low-frequency unit for converting the speech signal converted into the frequency domain by the frequency conversion means. A linear prediction analysis is performed in the frequency direction for each of the frequency component and the high frequency side coefficient to obtain a low frequency linear prediction coefficient and a high frequency linear prediction coefficient, and a difference between the low frequency linear prediction coefficient and the high frequency linear prediction coefficient is obtained. It is preferable that the difference information is acquired.

本発明の音声符号化装置では、前記差分情報は、ＬＳＰ（Linear SpectrumPair）、ＩＳＰ（Immittance Spectrum Pair）、ＬＳＦ（Linear Spectrum Frequency）、ＩＳＦ（ImmittanceSpectrum Frequency）、ＰＡＲＣＯＲ係数のいずれかの領域における線形予測係数の差分を表すのが好ましい。 In the speech encoding device of the present invention, the difference information is linear prediction in any region of LSP (Linear Spectrum Pair), ISP (Immittance Spectrum Pair), LSF (Linear Spectrum Frequency), ISF (Immittance Spectrum Frequency), and PARCOR coefficient. It is preferable to represent the difference between the coefficients.

本発明の音声符号化装置は、音声信号を符号化する音声符号化装置であって、前記音声信号の低周波成分を符号化するコア符号化手段と、前記音声信号を周波数領域に変換する周波数変換手段と、前記周波数変換手段によって周波数領域に変換された前記音声信号の高周波側係数に対し周波数方向に線形予測分析を行って高周波線形予測係数を取得する線形予測分析手段と、前記線形予測分析手段によって取得された前記高周波線形予測係数を時間方向に間引く予測係数間引き手段と、前記予測係数間引き手段によって間引きされた後の前記高周波線形予測係数を量子化する予測係数量子化手段と、少なくとも前記コア符号化手段による符号化後の前記低周波成分と前記予測係数量子化手段による量子化後の前記高周波線形予測係数とが多重化されたビットストリームを生成するビットストリーム多重化手段と、を備える、ことを特徴とする。 The speech coding apparatus according to the present invention is a speech coding apparatus that encodes a speech signal, and includes a core coding unit that encodes a low frequency component of the speech signal, and a frequency that converts the speech signal into a frequency domain. Conversion means, linear prediction analysis means for obtaining a high-frequency linear prediction coefficient by performing linear prediction analysis in a frequency direction on the high-frequency side coefficient of the speech signal converted into the frequency domain by the frequency conversion means, and the linear prediction analysis Prediction coefficient thinning means for thinning out the high-frequency linear prediction coefficient acquired by the means in the time direction, prediction coefficient quantization means for quantizing the high-frequency linear prediction coefficient after thinning out by the prediction coefficient thinning means, The low frequency component after encoding by the core encoding means and the high frequency linear prediction coefficient after quantization by the prediction coefficient quantization means are multiplexed. Comprising a bit stream multiplexing means for generating a bit stream, and wherein the.

本発明の音声復号装置は、符号化された音声信号を復号する音声復号装置であって、前記符号化された音声信号を含む外部からのビットストリームを、符号化ビットストリームと時間エンベロープ補助情報とに分離するビットストリーム分離手段と、前記ビットストリーム分離手段によって分離された前記符号化ビットストリームを復号して低周波成分を得るコア復号手段と、前記コア復号手段によって得られた前記低周波成分を周波数領域に変換する周波数変換手段と、前記周波数変換手段によって周波数領域に変換された前記低周波成分を低周波帯域から高周波帯域に複写することによって高周波成分を生成する高周波生成手段と、前記周波数変換手段によって周波数領域に変換された前記低周波成分を分析して時間エンベロープ情報を取得する低周波時間エンベロープ分析手段と、前記低周波時間エンベロープ分析手段によって取得された前記時間エンベロープ情報を、前記時間エンベロープ補助情報を用いて調整する時間エンベロープ調整手段と、前記時間エンベロープ調整手段による調整後の前記時間エンベロープ情報を用いて、前記高周波生成手段によって生成された前記高周波成分の時間エンベロープを変形する時間エンベロープ変形手段と、を備えることを特徴とする。 A speech decoding apparatus according to the present invention is a speech decoding apparatus that decodes an encoded speech signal, wherein an external bit stream including the encoded speech signal is converted into an encoded bit stream, time envelope auxiliary information, and A bit stream separating means for separating the encoded bit stream, a core decoding means for decoding the encoded bit stream separated by the bit stream separating means to obtain a low frequency component, and a low frequency component obtained by the core decoding means. Frequency conversion means for converting to a frequency domain, high frequency generation means for generating a high frequency component by copying the low frequency component converted to the frequency domain by the frequency conversion means from a low frequency band to a high frequency band, and the frequency conversion Time envelope information by analyzing the low frequency component transformed into the frequency domain by means Low frequency time envelope analyzing means to be obtained, time envelope adjusting means for adjusting the time envelope information acquired by the low frequency time envelope analyzing means using the time envelope auxiliary information, and after adjustment by the time envelope adjusting means Time envelope deformation means for deforming the time envelope of the high-frequency component generated by the high-frequency generation means using the time envelope information.

本発明の音声復号装置では、前記高周波成分を調整する高周波調整手段を更に備え、前記周波数変換手段は、実数又は複素数の係数を持つ６４分割ＱＭＦフィルタバンクであり、前記周波数変換手段、前記高周波生成手段、前記高周波調整手段は“ISO/IEC 14496-3”に規定される“MPEG4 AAC”におけるＳＢＲ復号器（ＳＢＲ：SpectralBand Replication）に準拠した動作をするのが好ましい。 The speech decoding apparatus according to the present invention further includes a high frequency adjusting means for adjusting the high frequency component, and the frequency converting means is a 64-division QMF filter bank having real or complex coefficients, and the frequency converting means and the high frequency generating means. The high-frequency adjusting means preferably operates in accordance with an SBR decoder (SBR: SpectralBand Replication) in “MPEG4 AAC” defined in “ISO / IEC 14496-3”.

本発明の音声復号装置では、前記低周波時間エンベロープ分析手段は、前記周波数変換手段によって周波数領域に変換された前記低周波成分に周波数方向の線形予測分析を行って低周波線形予測係数を取得し、前記時間エンベロープ調整手段は、前記時間エンベロープ補助情報を用いて前記低周波線形予測係数を調整し、前記時間エンベロープ変形手段は、前記高周波生成手段によって生成された周波数領域の前記高周波成分に対し前記時間エンベロープ調整手段によって調整された線形予測係数を用いて周波数方向の線形予測フィルタ処理を行って音声信号の時間エンベロープを変形するのが好ましい。 In the speech decoding apparatus according to the present invention, the low frequency temporal envelope analysis means obtains a low frequency linear prediction coefficient by performing a linear prediction analysis in a frequency direction on the low frequency component converted into the frequency domain by the frequency conversion means. The time envelope adjusting means adjusts the low frequency linear prediction coefficient using the time envelope auxiliary information, and the time envelope deforming means applies the high frequency component of the frequency domain generated by the high frequency generating means to the high frequency component. It is preferable to perform linear prediction filter processing in the frequency direction using the linear prediction coefficient adjusted by the time envelope adjusting means to deform the time envelope of the audio signal.

本発明の音声復号装置では、前記低周波時間エンベロープ分析手段は、前記周波数変換手段によって周波数領域に変換された前記低周波成分の時間スロットごとの電力を取得することによって音声信号の時間エンベロープ情報を取得し、前記時間エンベロープ調整手段は、前記時間エンベロープ補助情報を用いて前記時間エンベロープ情報を調整し、前記時間エンベロープ変形手段は、前記高周波生成手段によって生成された周波数領域の高周波成分に前記調整後の時間エンベロープ情報を重畳することにより高周波成分の時間エンベロープを変形するのが好ましい。 In the speech decoding apparatus of the present invention, the low frequency time envelope analyzing means obtains the power of each time slot of the low frequency component converted into the frequency domain by the frequency converting means, thereby obtaining time envelope information of the speech signal. The time envelope adjusting means adjusts the time envelope information using the time envelope auxiliary information, and the time envelope deforming means adds the high frequency component of the frequency domain generated by the high frequency generating means to the high frequency component after the adjustment. It is preferable to deform the time envelope of the high-frequency component by superimposing the time envelope information.

本発明の音声復号装置では、前記低周波時間エンベロープ分析手段は、前記周波数変換手段によって周波数領域に変換された前記低周波成分のＱＭＦサブバンドサンプルごとの電力を取得することによって音声信号の時間エンベロープ情報を取得し、前記時間エンベロープ調整手段は、前記時間エンベロープ補助情報を用いて前記時間エンベロープ情報を調整し、前記時間エンベロープ変形手段は、前記高周波生成手段によって生成された周波数領域の高周波成分に前記調整後の時間エンベロープ情報を乗算することにより高周波成分の時間エンベロープを変形するのが好ましい。 In the speech decoding apparatus of the present invention, the low frequency time envelope analyzing means obtains the power for each QMF subband sample of the low frequency component converted into the frequency domain by the frequency converting means, thereby obtaining a time envelope of the speech signal. Information is acquired, the time envelope adjusting means adjusts the time envelope information using the time envelope auxiliary information, and the time envelope deforming means adds the high frequency component of the frequency domain generated by the high frequency generating means to the high frequency component. Preferably, the time envelope of the high frequency component is deformed by multiplying the adjusted time envelope information.

本発明の音声復号装置では、前記時間エンベロープ補助情報は、線形予測係数の強度の調整に用いるためのフィルタ強度パラメータを表すのが好ましい。 In the speech decoding apparatus of the present invention, it is preferable that the temporal envelope auxiliary information represents a filter strength parameter for use in adjusting the strength of the linear prediction coefficient.

本発明の音声復号装置では、前記時間エンベロープ補助情報は、前記時間エンベロープ情報の時間変化の大きさを示すパラメータを表すのが好ましい。 In the speech decoding apparatus according to the present invention, it is preferable that the time envelope auxiliary information represents a parameter indicating a magnitude of a time change of the time envelope information.

本発明の音声復号装置では、前記時間エンベロープ補助情報は、前記低周波線形予測係数に対する線形予測係数の差分情報を含むのが好ましい。 In the speech decoding apparatus of the present invention, it is preferable that the temporal envelope auxiliary information includes difference information of a linear prediction coefficient with respect to the low frequency linear prediction coefficient.

本発明の音声復号装置では、前記差分情報は、ＬＳＰ（Linear SpectrumPair）、ＩＳＰ（Immittance Spectrum Pair）、ＬＳＦ（Linear Spectrum Frequency）、ＩＳＦ（ImmittanceSpectrum Frequency）、ＰＡＲＣＯＲ係数のいずれかの領域における線形予測係数の差分を表すのが好ましい。 In the speech decoding apparatus according to the present invention, the difference information is a linear prediction coefficient in any region of LSP (Linear Spectrum Pair), ISP (Immittance Spectrum Pair), LSF (Linear Spectrum Frequency), ISF (Immittance Spectrum Frequency), and PARCOR coefficient. It is preferable to express the difference between the two.

本発明の音声復号装置では、前記低周波時間エンベロープ分析手段は、前記周波数変換手段によって周波数領域に変換された前記低周波成分に対し周波数方向の線形予測分析を行って前記低周波線形予測係数を取得するとともに、当該周波数領域の前記低周波成分の時間スロットごとの電力を取得することによって音声信号の時間エンベロープ情報を取得し、前記時間エンベロープ調整手段は、前記時間エンベロープ補助情報を用いて前記低周波線形予測係数を調整するとともに前記時間エンベロープ補助情報を用いて前記時間エンベロープ情報を調整し、前記時間エンベロープ変形手段は、前記高周波生成手段によって生成された周波数領域の高周波成分に対し前記時間エンベロープ調整手段によって調整された線形予測係数を用いて周波数方向の線形予測フィルタ処理を行って音声信号の時間エンベロープを変形するとともに当該周波数領域の前記高周波成分に前記時間エンベロープ調整手段による調整後の前記時間エンベロープ情報を重畳することにより前記高周波成分の時間エンベロープを変形するのが好ましい。 In the speech decoding apparatus of the present invention, the low frequency temporal envelope analyzing means performs a linear prediction analysis in a frequency direction on the low frequency component converted into the frequency domain by the frequency converting means to obtain the low frequency linear prediction coefficient. And acquiring time envelope information of the audio signal by acquiring power for each time slot of the low frequency component in the frequency domain, and the time envelope adjusting means uses the time envelope auxiliary information to acquire the low frequency envelope information. Adjusting the frequency linear prediction coefficient and adjusting the time envelope information using the time envelope auxiliary information, and the time envelope deforming means adjusts the time envelope for the high frequency component of the frequency domain generated by the high frequency generating means. Frequency using linear prediction coefficients adjusted by the means A time envelope of the high frequency component by superimposing the time envelope information adjusted by the time envelope adjusting means on the high frequency component of the frequency domain by transforming the time envelope of the speech signal by performing a linear prediction filter processing of direction Is preferably modified.

本発明の音声復号装置では、前記低周波時間エンベロープ分析手段は、前記周波数変換手段によって周波数領域に変換された前記低周波成分に対し周波数方向の線形予測分析を行って前記低周波線形予測係数を取得するとともに、当該周波数領域の前記低周波成分のＱＭＦサブバンドサンプルごとの電力を取得することによって音声信号の時間エンベロープ情報を取得し、前記時間エンベロープ調整手段は、前記時間エンベロープ補助情報を用いて前記低周波線形予測係数を調整するとともに前記時間エンベロープ補助情報を用いて前記時間エンベロープ情報を調整し、前記時間エンベロープ変形手段は、前記高周波生成手段によって生成された周波数領域の高周波成分に対し前記時間エンベロープ調整手段による調整後の線形予測係数を用いて周波数方向の線形予測フィルタ処理を行って音声信号の時間エンベロープを変形するとともに当該周波数領域の前記高周波成分に前記時間エンベロープ調整手段による調整後の前記時間エンベロープ情報を乗算することにより前記高周波成分の時間エンベロープを変形するのが好ましい。 In the speech decoding apparatus of the present invention, the low frequency temporal envelope analyzing means performs a linear prediction analysis in a frequency direction on the low frequency component converted into the frequency domain by the frequency converting means to obtain the low frequency linear prediction coefficient. And acquiring time envelope information of the audio signal by acquiring power for each QMF subband sample of the low frequency component in the frequency domain, and the time envelope adjusting means uses the time envelope auxiliary information. The low-frequency linear prediction coefficient is adjusted and the time envelope information is adjusted using the time envelope auxiliary information, and the time envelope deforming means is configured to adjust the time for high frequency components in the frequency domain generated by the high frequency generating means. Use linear prediction coefficient after adjustment by envelope adjustment means The frequency envelope linear predictive filter processing is performed to transform the time envelope of the audio signal, and the high frequency component in the frequency domain is multiplied by the time envelope information adjusted by the time envelope adjusting means. It is preferable to deform the time envelope.

本発明の音声復号装置では、前記時間エンベロープ補助情報は、線形予測係数のフィルタ強度と、前記時間エンベロープ情報の時間変化の大きさとの両方を示すパラメータを表すのが好ましい。 In the speech decoding apparatus of the present invention, it is preferable that the time envelope auxiliary information represents a parameter indicating both the filter strength of the linear prediction coefficient and the time change magnitude of the time envelope information.

本発明の音声復号装置は、符号化された音声信号を復号する音声復号装置であって、前記符号化された音声信号を含む外部からのビットストリームを、符号化ビットストリームと線形予測係数とに分離するビットストリーム分離手段と、前記線形予測係数を時間方向に補間又は補外する線形予測係数補間・補外手段と、前記線形予測係数補間・補外手段によって補間又は補外された線形予測係数を用いて周波数領域で表現された高周波成分に周波数方向の線形予測フィルタ処理を行って音声信号の時間エンベロープを変形する時間エンベロープ変形手段と、を備える、ことを特徴とする。 A speech decoding apparatus according to the present invention is a speech decoding apparatus that decodes an encoded speech signal, and converts an external bit stream including the encoded speech signal into an encoded bit stream and a linear prediction coefficient. Bit stream separation means for separating, linear prediction coefficient interpolation / extrapolation means for interpolating or extrapolating the linear prediction coefficient in the time direction, and linear prediction coefficients interpolated or extrapolated by the linear prediction coefficient interpolation / extrapolation means And a time envelope deforming means for deforming the time envelope of the audio signal by performing linear prediction filter processing in the frequency direction on the high-frequency component expressed in the frequency domain.

本発明の音声符号化方法は、音声信号を符号化する音声符号化装置を用いた音声符号化方法であって、前記音声符号化装置が、前記音声信号の低周波成分を符号化するコア符号化ステップと、前記音声符号化装置が、前記音声信号の低周波成分の時間エンベロープを用いて、前記音声信号の高周波成分の時間エンベロープの近似を得るための時間エンベロープ補助情報を算出する時間エンベロープ補助情報算出ステップと、前記音声符号化装置が、少なくとも、前記コア符号化ステップにおいて符号化した前記低周波成分と、前記時間エンベロープ補助情報算出ステップにおいて算出した前記時間エンベロープ補助情報とが多重化されたビットストリームを生成するビットストリーム多重化ステップと、を備える、ことを特徴とする。 The speech encoding method of the present invention is a speech encoding method using a speech encoding device that encodes a speech signal, wherein the speech encoding device encodes a low-frequency component of the speech signal. And a time envelope assist in which the speech coding apparatus calculates time envelope assist information for obtaining an approximation of a time envelope of a high frequency component of the speech signal using a time envelope of a low frequency component of the speech signal. And at least the low-frequency component encoded in the core encoding step and the time envelope auxiliary information calculated in the time envelope auxiliary information calculation step are multiplexed by the speech encoding apparatus. And a bitstream multiplexing step for generating a bitstream.

本発明の音声符号化方法は、音声信号を符号化する音声符号化装置を用いた音声符号化方法であって、前記音声符号化装置が、前記音声信号の低周波成分を符号化するコア符号化ステップと、前記音声符号化装置が、前記音声信号を周波数領域に変換する周波数変換ステップと、前記音声符号化装置が、前記周波数変換ステップにおいて周波数領域に変換した前記音声信号の高周波側係数に対し周波数方向に線形予測分析を行って高周波線形予測係数を取得する線形予測分析ステップと、前記音声符号化装置が、前記線形予測分析ステップにおいて取得した前記高周波線形予測係数を時間方向に間引く予測係数間引きステップと、前記音声符号化装置が、前記予測係数間引きステップにおける間引き後の前記高周波線形予測係数を量子化する予測係数量子化ステップと、前記音声符号化装置が、少なくとも前記コア符号化ステップにおける符号化後の前記低周波成分と前記予測係数量子化ステップにおける量子化後の前記高周波線形予測係数とが多重化されたビットストリームを生成するビットストリーム多重化ステップと、を備える、ことを特徴とする。 The speech encoding method of the present invention is a speech encoding method using a speech encoding device that encodes a speech signal, wherein the speech encoding device encodes a low-frequency component of the speech signal. Step, a frequency conversion step in which the speech encoding apparatus converts the speech signal into a frequency domain, and a high frequency side coefficient of the speech signal that the speech encoding apparatus has converted into the frequency domain in the frequency conversion step. A linear prediction analysis step for obtaining a high-frequency linear prediction coefficient by performing linear prediction analysis in the frequency direction, and a prediction coefficient by which the speech coding apparatus thins out the high-frequency linear prediction coefficient acquired in the linear prediction analysis step in the time direction. A thinning step; and the speech encoding apparatus quantizes the high-frequency linear prediction coefficient after the thinning in the prediction coefficient thinning step. And the speech encoding apparatus multiplexes at least the low frequency component after encoding in the core encoding step and the high frequency linear prediction coefficient after quantization in the prediction coefficient quantization step. And a bitstream multiplexing step for generating a generated bitstream.

本発明の音声復号方法は、符号化された音声信号を復号する音声復号装置を用いた音声復号方法であって、前記音声復号装置が、前記符号化された音声信号を含む外部からのビットストリームを、符号化ビットストリームと時間エンベロープ補助情報とに分離するビットストリーム分離ステップと、前記音声復号装置が、前記ビットストリーム分離ステップにおいて分離した前記符号化ビットストリームを復号して低周波成分を得るコア復号ステップと、前記音声復号装置が、前記コア復号ステップにおいて得た前記低周波成分を周波数領域に変換する周波数変換ステップと、前記音声復号装置が、前記周波数変換ステップにおいて周波数領域に変換した前記低周波成分を低周波帯域から高周波帯域に複写することによって高周波成分を生成する高周波生成ステップと、前記音声復号装置が、前記周波数変換ステップにおいて周波数領域に変換した前記低周波成分を分析して時間エンベロープ情報を取得する低周波時間エンベロープ分析ステップと、前記音声復号装置が、前記低周波時間エンベロープ分析ステップにおいて取得した前記時間エンベロープ情報を、前記時間エンベロープ補助情報を用いて調整する時間エンベロープ調整ステップと、前記音声復号装置が、前記時間エンベロープ調整ステップにおける調整後の前記時間エンベロープ情報を用いて、前記高周波生成ステップにおいて生成した前記高周波成分の時間エンベロープを変形する時間エンベロープ変形ステップと、を備えることを特徴とする。 The speech decoding method of the present invention is a speech decoding method using a speech decoding device that decodes an encoded speech signal, and the speech decoding device includes an external bitstream including the encoded speech signal. A bit stream separating step for separating the encoded bit stream and the time envelope auxiliary information, and a core for obtaining a low frequency component by decoding the coded bit stream separated in the bit stream separating step by the speech decoding apparatus A decoding step; a frequency converting step in which the speech decoding apparatus converts the low-frequency component obtained in the core decoding step into a frequency domain; and the low-frequency component converted into the frequency domain in the frequency converting step. Generate high frequency components by copying frequency components from low frequency band to high frequency band A high frequency generation step, a low frequency time envelope analysis step in which the speech decoding apparatus analyzes the low frequency component converted into the frequency domain in the frequency conversion step to obtain time envelope information, and the speech decoding apparatus includes: A time envelope adjustment step of adjusting the time envelope information acquired in the low frequency time envelope analysis step using the time envelope auxiliary information; and the speech decoding apparatus adjusts the time envelope after the adjustment in the time envelope adjustment step. And a time envelope deformation step of deforming a time envelope of the high frequency component generated in the high frequency generation step using information.

本発明の音声復号方法は、符号化された音声信号を復号する音声復号装置を用いた音声復号方法であって、前記音声復号装置が、前記符号化された音声信号を含む外部からのビットストリームを、符号化ビットストリームと線形予測係数とに分離するビットストリーム分離ステップと、前記音声復号装置が、前記線形予測係数を時間方向に補間又は補外する線形予測係数補間・補外ステップと、前記音声復号装置が、前記線形予測係数補間・補外ステップにおいて補間又は補外した前記線形予測係数を用いて、周波数領域で表現された高周波成分に周波数方向の線形予測フィルタ処理を行って音声信号の時間エンベロープを変形する時間エンベロープ変形ステップと、を備える、ことを特徴とする。 The speech decoding method of the present invention is a speech decoding method using a speech decoding device that decodes an encoded speech signal, and the speech decoding device includes an external bitstream including the encoded speech signal. A bit stream separation step of separating the encoded bit stream and linear prediction coefficients, a linear prediction coefficient interpolation / extrapolation step in which the speech decoding apparatus interpolates or extrapolates the linear prediction coefficients in the time direction, The speech decoding apparatus performs linear prediction filter processing in the frequency direction on the high-frequency component expressed in the frequency domain using the linear prediction coefficient interpolated or extrapolated in the linear prediction coefficient interpolation / extrapolation step to generate a speech signal A time envelope deformation step for deforming the time envelope.

本発明の音声符号化プログラムは、音声信号を符号化するために、コンピュータ装置を、前記音声信号の低周波成分を符号化するコア符号化手段、前記音声信号の低周波成分の時間エンベロープを用いて、前記音声信号の高周波成分の時間エンベロープの近似を得るための時間エンベロープ補助情報を算出する時間エンベロープ補助情報算出手段、及び、少なくとも、前記コア符号化手段によって符号化された前記低周波成分と、前記時間エンベロープ補助情報算出手段によって算出された前記時間エンベロープ補助情報とが多重化されたビットストリームを生成するビットストリーム多重化手段、として機能させることを特徴とする。 The speech encoding program of the present invention uses a computer device, core encoding means for encoding a low frequency component of the speech signal, and a time envelope of the low frequency component of the speech signal to encode the speech signal. Time envelope auxiliary information calculating means for calculating time envelope auxiliary information for obtaining an approximation of the time envelope of the high frequency component of the audio signal, and at least the low frequency component encoded by the core encoding means And a bit stream multiplexing means for generating a bit stream multiplexed with the time envelope auxiliary information calculated by the time envelope auxiliary information calculating means.

本発明の音声符号化プログラムは、音声信号を符号化するために、コンピュータ装置を、前記音声信号の低周波成分を符号化するコア符号化手段、前記音声信号を周波数領域に変換する周波数変換手段、前記周波数変換手段によって周波数領域に変換された前記音声信号の高周波側係数に対し周波数方向に線形予測分析を行って高周波線形予測係数を取得する線形予測分析手段、前記線形予測分析手段によって取得された前記高周波線形予測係数を時間方向に間引く予測係数間引き手段、前記予測係数間引き手段によって間引きされた後の前記高周波線形予測係数を量子化する予測係数量子化手段、及び、少なくとも前記コア符号化手段による符号化後の前記低周波成分と前記予測係数量子化手段による量子化後の前記高周波線形予測係数とが多重化されたビットストリームを生成するビットストリーム多重化手段、として機能させることを特徴とする。 In order to encode a speech signal, the speech encoding program of the present invention includes a computer device, a core encoding unit that encodes a low frequency component of the speech signal, and a frequency conversion unit that converts the speech signal into a frequency domain. , Linear prediction analysis means for obtaining a high-frequency linear prediction coefficient by performing linear prediction analysis in a frequency direction on the high-frequency side coefficient of the speech signal converted into the frequency domain by the frequency conversion means, and acquired by the linear prediction analysis means Further, prediction coefficient thinning means for thinning out the high-frequency linear prediction coefficient in the time direction, prediction coefficient quantization means for quantizing the high-frequency linear prediction coefficient after thinning out by the prediction coefficient thinning-out means, and at least the core coding means The low-frequency component after encoding by the high-frequency linear prediction coefficient after quantization by the prediction coefficient quantization means, Characterized in that to function bit stream multiplexing means for generating the bitstream as.

本発明の音声復号プログラムは、符号化された音声信号を復号するために、コンピュータ装置を、前記符号化された音声信号を含む外部からのビットストリームを、符号化ビットストリームと時間エンベロープ補助情報とに分離するビットストリーム分離手段、前記ビットストリーム分離手段によって分離された前記符号化ビットストリームを復号して低周波成分を得るコア復号手段、前記コア復号手段によって得られた前記低周波成分を周波数領域に変換する周波数変換手段、前記周波数変換手段によって周波数領域に変換された前記低周波成分を低周波帯域から高周波帯域に複写することによって高周波成分を生成する高周波生成手段、前記周波数変換手段によって周波数領域に変換された前記低周波成分を分析して時間エンベロープ情報を取得する低周波時間エンベロープ分析手段、前記低周波時間エンベロープ分析手段によって取得された前記時間エンベロープ情報を、前記時間エンベロープ補助情報を用いて調整する時間エンベロープ調整手段、及び、前記時間エンベロープ調整手段による調整後の前記時間エンベロープ情報を用いて、前記高周波生成手段によって生成された前記高周波成分の時間エンベロープを変形する時間エンベロープ変形手段、として機能させることを特徴とする。 In order to decode an encoded audio signal, the audio decoding program of the present invention uses a computer device to convert an external bit stream including the encoded audio signal into an encoded bit stream, time envelope auxiliary information, A bit stream separating means for separating the encoded bit stream by the bit stream separating means to obtain a low frequency component, and a low frequency component obtained by the core decoding means in the frequency domain A frequency converting means for converting to a frequency region, a high frequency generating means for generating a high frequency component by copying the low frequency component converted into the frequency region by the frequency converting means from a low frequency band to a high frequency band, Analyzing the low-frequency component converted into a time envelope information By the low-frequency time envelope analyzing means for acquiring the time envelope information acquired by the low-frequency time envelope analyzing means by using the time envelope auxiliary information, and the time envelope adjusting means. The time envelope information after adjustment is used to function as time envelope deformation means for deforming the time envelope of the high frequency component generated by the high frequency generation means.

本発明の音声復号プログラムは、符号化された音声信号を復号するために、コンピュータ装置を、前記符号化された音声信号を含む外部からのビットストリームを、符号化ビットストリームと線形予測係数とに分離するビットストリーム分離手段、前記線形予測係数を時間方向に補間又は補外する線形予測係数補間・補外手段、及び、前記線形予測係数補間・補外手段によって補間又は補外された線形予測係数を用いて周波数領域で表現された高周波成分に周波数方向の線形予測フィルタ処理を行って音声信号の時間エンベロープを変形する時間エンベロープ変形手段、として機能させることを特徴とする。 In order to decode an encoded audio signal, an audio decoding program according to the present invention converts a computer apparatus into an external bit stream including the encoded audio signal into an encoded bit stream and a linear prediction coefficient. Bit stream separation means for separating, linear prediction coefficient interpolation / extrapolation means for interpolating or extrapolating the linear prediction coefficient in the time direction, and linear prediction coefficients interpolated or extrapolated by the linear prediction coefficient interpolation / extrapolation means And a high-frequency component expressed in the frequency domain by performing linear prediction filter processing in the frequency direction to function as time envelope deformation means for deforming the time envelope of the audio signal.

本発明の音声復号装置では、前記時間エンベロープ変形手段は、前記高周波生成手段によって生成された周波数領域の前記高周波成分に対し周波数方向の線形予測フィルタ処理を行った後、前記線形予測フィルタ処理の結果得られた高周波成分の電力を前記線形予測フィルタ処理前と等しい値に調整するのが好ましい。 In the speech decoding apparatus of the present invention, the time envelope deforming unit performs a linear prediction filter process in the frequency direction on the high frequency component in the frequency domain generated by the high frequency generating unit, and then results of the linear prediction filter process. It is preferable to adjust the power of the obtained high frequency component to a value equal to that before the linear prediction filter processing.

本発明の音声復号装置では、前記時間エンベロープ変形手段は、前記高周波生成手段によって生成された周波数領域の前記高周波成分に対し周波数方向の線形予測フィルタ処理を行った後、前記線形予測フィルタ処理の結果得られた高周波成分の任意の周波数範囲内の電力を前記線形予測フィルタ処理前と等しい値に調整するのが好ましい。 In the speech decoding apparatus of the present invention, the time envelope deforming unit performs a linear prediction filter process in the frequency direction on the high frequency component in the frequency domain generated by the high frequency generating unit, and then results of the linear prediction filter process. It is preferable to adjust the power within an arbitrary frequency range of the obtained high frequency component to a value equal to that before the linear prediction filter processing.

本発明の音声復号装置では、前記時間エンベロープ補助情報は、前記調整後の前記時間エンベロープ情報における最小値と平均値の比率であるのが好ましい。 In the speech decoding apparatus according to the present invention, it is preferable that the time envelope auxiliary information is a ratio between a minimum value and an average value in the adjusted time envelope information.

本発明の音声復号装置では、前記時間エンベロープ変形手段は、前記周波数領域の高周波成分のＳＢＲエンベロープ時間セグメント内での電力が時間エンベロープの変形の前と後で等しくなるように前記調整後の時間エンベロープの利得を制御した後に、前記周波数領域の高周波成分に前記利得制御された時間エンベロープを乗算することにより高周波成分の時間エンベロープを変形するのが好ましい。 In the speech decoding apparatus according to the present invention, the time envelope deforming means may adjust the time envelope after the adjustment so that the power in the SBR envelope time segment of the high frequency component in the frequency domain becomes equal before and after the deformation of the time envelope. After controlling the gain, it is preferable to transform the time envelope of the high frequency component by multiplying the high frequency component of the frequency domain by the gain-controlled time envelope.

本発明の音声復号装置では、前記低周波時間エンベロープ分析手段は、前記周波数変換手段によって周波数領域に変換された前記低周波成分のＱＭＦサブバンドサンプルごとの電力を取得し、さらにＳＢＲエンベロープ時間セグメント内での平均電力を用いて前記ＱＭＦサブバンドサンプルごとの電力を正規化することによって、各ＱＭＦサブバンドサンプルへ乗算されるべきゲイン係数として表現された時間エンベロープ情報を取得するのが好ましい。 In the speech decoding apparatus of the present invention, the low frequency time envelope analyzing means acquires power for each QMF subband sample of the low frequency component converted into the frequency domain by the frequency converting means, and further, within the SBR envelope time segment. It is preferable to obtain time envelope information expressed as a gain coefficient to be multiplied to each QMF subband sample by normalizing the power for each QMF subband sample using the average power at.

本発明の音声復号装置は、符号化された音声信号を復号する音声復号装置であって、前記符号化された音声信号を含む外部からのビットストリームを復号して低周波成分を得るコア復号手段と、前記コア復号手段によって得られた前記低周波成分を周波数領域に変換する周波数変換手段と、前記周波数変換手段によって周波数領域に変換された前記低周波成分を低周波帯域から高周波帯域に複写することによって高周波成分を生成する高周波生成手段と、前記周波数変換手段によって周波数領域に変換された前記低周波成分を分析して時間エンベロープ情報を取得する低周波時間エンベロープ分析手段と、前記ビットストリームを分析して時間エンベロープ補助情報を生成する時間エンベロープ補助情報生成部と、前記低周波時間エンベロープ分析手段によって取得された前記時間エンベロープ情報を、前記時間エンベロープ補助情報を用いて調整する時間エンベロープ調整手段と、前記時間エンベロープ調整手段による調整後の前記時間エンベロープ情報を用いて、前記高周波生成手段によって生成された前記高周波成分の時間エンベロープを変形する時間エンベロープ変形手段と、を備える、ことを特徴とする。 The speech decoding apparatus according to the present invention is a speech decoding apparatus that decodes an encoded speech signal, and that decodes an external bit stream including the encoded speech signal to obtain a low frequency component. And a frequency converting means for converting the low frequency component obtained by the core decoding means to a frequency domain, and copying the low frequency component converted to the frequency domain by the frequency converting means from a low frequency band to a high frequency band. A high-frequency generating means for generating a high-frequency component, a low-frequency time envelope analyzing means for analyzing the low-frequency component converted into the frequency domain by the frequency converting means to obtain time envelope information, and analyzing the bitstream A time envelope auxiliary information generating unit for generating time envelope auxiliary information, and the low frequency time envelope The time envelope information acquired by the analysis means is adjusted using the time envelope auxiliary information, and the time envelope information adjusted by the time envelope adjustment means is used by the high frequency generation means. And a time envelope deforming means for deforming the generated time envelope of the high-frequency component.

本発明の音声復号装置では、前記高周波調整手段に相当する、一次高周波調整手段と、二次高周波調整手段とを具備し、前記一次高周波調整手段は、前記高周波調整手段に相当する処理の一部を含む処理を実行し、前記時間エンベロープ変形手段は、前記一次高周波調整手段の出力信号に対し時間エンベロープの変形を行い、前記二次高周波調整手段は、前記時間エンベロープ変形手段の出力信号に対して、前記高周波調整手段に相当する処理のうち前記一次高周波調整手段で実行されない処理を実行するのが好ましく、前記二次高周波調整手段は、ＳＢＲの復号過程における正弦波の付加処理であるのが好ましい。 The speech decoding apparatus of the present invention includes a primary high-frequency adjusting unit and a secondary high-frequency adjusting unit corresponding to the high-frequency adjusting unit, and the primary high-frequency adjusting unit is a part of the process corresponding to the high-frequency adjusting unit. The time envelope deformation means performs time envelope deformation on the output signal of the primary high frequency adjustment means, and the secondary high frequency adjustment means applies to the output signal of the time envelope deformation means. Of the processes corresponding to the high-frequency adjusting means, it is preferable to execute a process that is not executed by the primary high-frequency adjusting means, and the secondary high-frequency adjusting means is preferably a sine wave addition process in the SBR decoding process. .

本発明によれば、ＳＢＲに代表される周波数領域での帯域拡張技術において、ビットレートを著しく増大させることなく、発生するプリエコー・ポストエコーを軽減し復号信号の主観的品質を向上できる。 According to the present invention, it is possible to reduce the generated pre-echo and post-echo and improve the subjective quality of the decoded signal without significantly increasing the bit rate in the band expansion technique in the frequency domain represented by SBR.

第１の実施形態に係る音声符号化装置の構成を示す図である。It is a figure which shows the structure of the audio | voice coding apparatus which concerns on 1st Embodiment. 第１の実施形態に係る音声符号化装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the audio | voice coding apparatus which concerns on 1st Embodiment. 第１の実施形態に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the audio | voice decoding apparatus which concerns on 1st Embodiment. 第１の実施形態に係る音声復号装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the speech decoding apparatus which concerns on 1st Embodiment. 第１の実施形態の変形例１に係る音声符号化装置の構成を示す図である。It is a figure which shows the structure of the audio | voice coding apparatus which concerns on the modification 1 of 1st Embodiment. 第２の実施形態に係る音声符号化装置の構成を示す図である。It is a figure which shows the structure of the audio | voice coding apparatus which concerns on 2nd Embodiment. 第２の実施形態に係る音声符号化装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the audio | voice coding apparatus which concerns on 2nd Embodiment. 第２の実施形態に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the audio | voice decoding apparatus which concerns on 2nd Embodiment. 第２の実施形態に係る音声復号装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the speech decoding apparatus which concerns on 2nd Embodiment. 第３の実施形態に係る音声符号化装置の構成を示す図である。It is a figure which shows the structure of the audio | voice coding apparatus which concerns on 3rd Embodiment. 第３の実施形態に係る音声符号化装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the audio | voice coding apparatus which concerns on 3rd Embodiment. 第３の実施形態に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the speech decoding apparatus which concerns on 3rd Embodiment. 第３の実施形態に係る音声復号装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the speech decoding apparatus which concerns on 3rd Embodiment. 第４の実施形態に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the speech decoding apparatus which concerns on 4th Embodiment. 第４の実施形態の変形例に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the speech decoding apparatus which concerns on the modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第１の実施形態の他の変形例に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the audio | voice decoding apparatus which concerns on the other modification of 1st Embodiment. 第１の実施形態の他の変形例に係る音声復号装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the speech decoding apparatus which concerns on the other modification of 1st Embodiment. 第１の実施形態の他の変形例に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the audio | voice decoding apparatus which concerns on the other modification of 1st Embodiment. 第１の実施形態の他の変形例に係る音声復号装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the speech decoding apparatus which concerns on the other modification of 1st Embodiment. 第２の実施形態の変形例に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the audio | voice decoding apparatus which concerns on the modification of 2nd Embodiment. 第２の実施形態の変形例に係る音声復号装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the speech decoding apparatus which concerns on the modification of 2nd Embodiment. 第２の実施形態の他の変形例に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the audio | voice decoding apparatus which concerns on the other modification of 2nd Embodiment. 第２の実施形態の他の変形例に係る音声復号装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the speech decoding apparatus which concerns on the other modification of 2nd Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の構成を示す図である。It is a figure which shows the structure of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声復号装置の動作を説明するためのフローチャートである。It is a flowchart for demonstrating operation | movement of the speech decoding apparatus which concerns on the other modification of 4th Embodiment. 第１の実施形態の他の変形例に係る音声符号化装置の構成を示す図である。It is a figure which shows the structure of the audio | voice coding apparatus which concerns on the other modification of 1st Embodiment. 第１の実施形態の他の変形例に係る音声符号化装置の構成を示す図である。It is a figure which shows the structure of the audio | voice coding apparatus which concerns on the other modification of 1st Embodiment. 第２の実施形態の変形例に係る音声符号化装置の構成を示す図である。It is a figure which shows the structure of the audio | voice coding apparatus which concerns on the modification of 2nd Embodiment. 第２の実施形態の他の変形例に係る音声符号化装置の構成を示す図である。It is a figure which shows the structure of the audio | voice coding apparatus which concerns on the other modification of 2nd Embodiment. 第４の実施形態に係る音声符号化装置の構成を示す図である。It is a figure which shows the structure of the audio | voice coding apparatus which concerns on 4th Embodiment. 第４の実施形態の変形例に係る音声符号化装置の構成を示す図である。It is a figure which shows the structure of the audio | voice coding apparatus which concerns on the modification of 4th Embodiment. 第４の実施形態の他の変形例に係る音声符号化装置の構成を示す図である。It is a figure which shows the structure of the audio | voice coding apparatus which concerns on the other modification of 4th Embodiment.

以下、図面を参照して、本発明に係る好適な実施形態について詳細に説明する。なお、図面の説明において、可能な場合には、同一要素には同一符号を付し、重複する説明を省略する。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings. In the description of the drawings, if possible, the same elements are denoted by the same reference numerals, and redundant description is omitted.

（第１の実施形態）
図１は、第１の実施形態に係る音声符号化装置１１の構成を示す図である。音声符号化装置１１は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声符号化装置１１の内蔵メモリに格納された所定のコンピュータプログラム（例えば、図２のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声符号化装置１１を統括的に制御する。音声符号化装置１１の通信装置は、符号化の対象となる音声信号を外部から受信し、更に、符号化された多重化ビットストリームを外部に出力する。 (First embodiment)
FIG. 1 is a diagram illustrating a configuration of a speech encoding device 11 according to the first embodiment. The speech encoding device 11 is physically provided with a CPU, ROM, RAM, communication device, etc. (not shown), and this CPU is a predetermined computer program (for example, stored in the internal memory of the speech encoding device 11 such as a ROM). The computer program for executing the processing shown in the flowchart of FIG. The communication device of the audio encoding device 11 receives an audio signal to be encoded from the outside, and further outputs an encoded multiplexed bit stream to the outside.

音声符号化装置１１は、機能的には、周波数変換部１ａ（周波数変換手段）、周波数逆変換部１ｂ、コアコーデック符号化部１ｃ（コア符号化手段）、ＳＢＲ符号化部１ｄ、線形予測分析部１ｅ（時間エンベロープ補助情報算出手段）、フィルタ強度パラメータ算出部１ｆ（時間エンベロープ補助情報算出手段）及びビットストリーム多重化部１ｇ（ビットストリーム多重化手段）を備える。図１に示す音声符号化装置１１の周波数変換部１ａ〜ビットストリーム多重化部１ｇは、音声符号化装置１１のＣＰＵが音声符号化装置１１の内蔵メモリに格納されたコンピュータプログラムを実行することによって実現される機能である。音声符号化装置１１のＣＰＵは、このコンピュータプログラムを実行することによって（図１に示す周波数変換部１ａ〜ビットストリーム多重化部１ｇを用いて）、図２のフローチャートに示す処理（ステップＳａ１〜ステップＳａ７の処理）を順次実行する。このコンピュータプログラムの実行に必要な各種データ、及び、このコンピュータプログラムの実行によって生成された各種データは、全て、音声符号化装置１１のＲＯＭやＲＡＭ等の内蔵メモリに格納されるものとする。 The speech encoding device 11 functionally includes a frequency converting unit 1a (frequency converting unit), a frequency inverse converting unit 1b, a core codec encoding unit 1c (core encoding unit), an SBR encoding unit 1d, and a linear prediction analysis. Unit 1e (time envelope auxiliary information calculating unit), filter strength parameter calculating unit 1f (time envelope auxiliary information calculating unit), and bitstream multiplexing unit 1g (bitstream multiplexing unit). The frequency conversion unit 1a to the bit stream multiplexing unit 1g of the speech encoding device 11 shown in FIG. 1 are executed by the CPU of the speech encoding device 11 executing a computer program stored in the built-in memory of the speech encoding device 11. This is a function that is realized. The CPU of the speech encoding device 11 executes the computer program (using the frequency converting unit 1a to the bitstream multiplexing unit 1g shown in FIG. 1), thereby performing the processing (step Sa1 to step 1) shown in the flowchart of FIG. The process of Sa7) is executed sequentially. It is assumed that various data necessary for the execution of the computer program and various data generated by the execution of the computer program are all stored in a built-in memory such as a ROM or a RAM of the speech encoding device 11.

周波数変換部１ａは、音声符号化装置１１の通信装置を介して受信された外部からの入力信号を多分割ＱＭＦフィルタバンクにより分析し、ＱＭＦ領域の信号ｑ（ｋ，ｒ）を得る（ステップＳａ１の処理）。ただし、ｋ（０≦ｋ≦６３）は周波数方向のインデックスであり、ｒは時間スロットを示すインデックスである。周波数逆変換部１ｂは、周波数変換部１ａから得られたＱＭＦ領域の信号のうち、低周波側の半数の係数をＱＭＦフィルタバンクにより合成し、入力信号の低周波成分のみを含むダウンサンプルされた時間領域信号を得る（ステップＳａ２の処理）。コアコーデック符号化部１ｃは、ダウンサンプルされた時間領域信号を符号化し、符号化ビットストリームを得る（ステップＳａ３の処理）。コアコーデック符号化部１ｃにおける符号化はＣＥＬＰ方式に代表される音声符号化方式に基づいてもよく、またＡＡＣに代表される変換符号化やＴＣＸ（Transform Coded Excitation）方式などの音響符号化に基づいてもよい。 The frequency converting unit 1a analyzes the input signal received from the outside via the communication device of the speech encoding device 11 using the multi-divided QMF filter bank, and obtains a signal q (k, r) in the QMF region (step Sa1). Processing). Here, k (0 ≦ k ≦ 63) is an index in the frequency direction, and r is an index indicating a time slot. The frequency inverse transform unit 1b synthesizes half of the low frequency side coefficients of the signal in the QMF region obtained from the frequency transform unit 1a by the QMF filter bank, and is downsampled including only the low frequency component of the input signal. A time domain signal is obtained (processing of step Sa2). The core codec encoding unit 1c encodes the down-sampled time domain signal to obtain an encoded bit stream (processing of step Sa3). The encoding in the core codec encoding unit 1c may be based on a speech encoding method typified by the CELP method, or based on acoustic coding such as transform coding typified by AAC or TCX (Transform Coded Excitation) method. May be.

ＳＢＲ符号化部１ｄは、周波数変換部１ａからＱＭＦ領域の信号を受け取り、高周波成分の電力・信号変化・トーナリティ等の分析に基づいてＳＢＲ符号化を行い、ＳＢＲ補助情報を得る（ステップＳａ４の処理）。周波数変換部１ａにおけるＱＭＦ分析の方法およびＳＢＲ符号化部１ｄにおけるＳＢＲ符号化の方法は、例えば文献“3GPP TS 26.404;Enhanced aacPlus encoder SBR part”に詳述されている。 The SBR encoding unit 1d receives the signal in the QMF region from the frequency conversion unit 1a, performs SBR encoding based on the analysis of the power, signal change, tonality, etc. of the high frequency component to obtain SBR auxiliary information (processing of step Sa4) ). The QMF analysis method in the frequency conversion unit 1a and the SBR encoding method in the SBR encoding unit 1d are described in detail in, for example, the document “3GPP TS 26.404; Enhanced aacPlus encoder SBR part”.

線形予測分析部１ｅは、周波数変換部１ａからＱＭＦ領域の信号を受け取り、この信号の高周波成分に対し周波数方向に線形予測分析を行って高周波線形予測係数ａ_Ｈ（ｎ，ｒ）（１≦ｎ≦Ｎ）を取得する（ステップＳａ５の処理）。ただしＮは線形予測次数である。また、インデックスｒは、ＱＭＦ領域の信号のサブサンプルに関する時間方向のインデックスである。信号線形予測分析には、共分散法又は自己相関法を用いることができる。ａ_Ｈ（ｎ，ｒ）を取得する際の線形予測分析は、ｑ（ｋ，ｒ）のうちｋ_ｘ＜ｋ≦６３をみたす高周波成分に対して行う。ただしｋ_ｘはコアコーデック符号化部１ｃによって符号化される周波数帯域の上限周波数に対応する周波数インデックスである。また、線形予測分析部１ｅは、ａ_Ｈ（ｎ，ｒ）を取得する際に分析したのとは別の低周波成分に対して線形予測分析を行い、ａ_Ｈ（ｎ，ｒ）とは別の低周波線形予測係数ａ_Ｌ（ｎ，ｒ）を取得してもよい（このような低周波成分に係る線形予測係数は時間エンベロープ情報に対応しており、以下、第１の実施形態においては同様）。ａ_Ｌ（ｎ，ｒ）を取得する際の線形予測分析は、０≦ｋ＜ｋ_ｘをみたす低周波成分に対するものである。また、この線形予測分析は０≦ｋ＜ｋ_ｘの区間に含まれる一部の周波数帯域に対するものであってもよい。 The linear prediction analysis unit 1e receives a signal in the QMF region from the frequency conversion unit 1a, performs linear prediction analysis on the high frequency component of this signal in the frequency direction, and performs a high frequency linear prediction coefficient a _H (n, r) (1 ≦ n). ≦ N) is acquired (processing of step Sa5). N is the linear prediction order. The index r is an index in the time direction regarding the subsample of the signal in the QMF region. A covariance method or an autocorrelation method can be used for signal linear prediction analysis. The linear prediction analysis for acquiring a _H (n, r) is performed on the high frequency component satisfying k _x <k ≦ 63 in q (k, r). However k _x is a frequency index corresponding to the upper limit frequency of the frequency band to be encoded by the core codec encoding unit 1c. Also, the linear prediction analysis unit 1e performs linear predictive analysis on another low-frequency component that was analyzed in obtaining a H _(n, r), different from the a H _(n, r) Of the low frequency linear prediction coefficient a _L (n, r) may be acquired (the linear prediction coefficient related to such a low frequency component corresponds to the time envelope information, and in the first embodiment, The same). The linear prediction analysis when acquiring a _L (n, r) is for low frequency components satisfying 0 ≦ k <k _x . Further, this linear prediction analysis may be performed for a part of frequency bands included in a section of 0 ≦ k <k _x .

フィルタ強度パラメータ算出部１ｆは、例えば、線形予測分析部１ｅによって取得された線形予測係数を用いてフィルタ強度パラメータ（フィルタ強度パラメータは時間エンベロープ補助情報に対応しており、以下、第１の実施形態においては同様）を算出する（ステップＳａ６の処理）。まず、ａ_Ｈ（ｎ，ｒ）から予測ゲインＧ_Ｈ（ｒ）が算出される。予測ゲインの算出方法は、たとえば“音声符号化、守谷健弘著、電子情報通信学会編”に詳述されている。さらに、ａ_Ｌ（ｎ，ｒ）が算出されている場合には同様に予測ゲインＧ_Ｌ（ｒ）が算出される。フィルタ強度パラメータＫ（ｒ）は、Ｇ_Ｈ（ｒ）が大きいほど大きくなるパラメータであり、例えば次の数式（１）に従って取得することができる。ただし、ｍａｘ（ａ，ｂ）はａとｂの最大値、ｍｉｎ（ａ，ｂ）はａとｂの最小値を示す。

The filter strength parameter calculation unit 1f uses, for example, the linear prediction coefficient acquired by the linear prediction analysis unit 1e, and the filter strength parameter (the filter strength parameter corresponds to the time envelope auxiliary information. The same applies to step S6) (processing of step Sa6). First, the prediction gain G _H (r) is calculated from a _H (n, r). The calculation method of the prediction gain is described in detail, for example, in “Voice coding, Takehiro Moriya, edited by the Institute of Electronics, Information and Communication Engineers”. Further, when a _L (n, r) is calculated, the prediction gain G _L (r) is calculated in the same manner. The filter strength parameter K (r) is a parameter that increases as G _H (r) increases. For example, the filter strength parameter K (r) can be obtained according to the following mathematical formula (1). However, max (a, b) represents the maximum value of a and b, and min (a, b) represents the minimum value of a and b.

また、Ｇ_Ｌ（ｒ）が算出されている場合には、Ｋ（ｒ）はＧ_Ｈ（ｒ）が大きいほど大きくなり、Ｇ_Ｌ（ｒ）が大きくなるほど小さくなるパラメータとして取得することができる。この場合のＫは例えば次の数式（２）に従って取得することができる。

When G _L (r) is calculated, K (r) can be acquired as a parameter that increases as G _H (r) increases and decreases as G _L (r) increases. In this case, K can be obtained, for example, according to the following formula (2).

Ｋ（ｒ）は、ＳＢＲ復号時に高周波成分の時間エンベロープを調整する強度を示すパラメータである。周波数方向の線形予測係数に対する予測ゲインは、分析区間の信号の時間エンベロープが急峻な変化を示すほど大きな値となる。Ｋ（ｒ）は、その値が大きいほど、ＳＢＲによって生成された高周波成分の時間エンベロープの変化を急峻にする処理を強めるよう復号器に指示するためのパラメータである。なお、Ｋ（ｒ）は、その値が小さいほど、ＳＢＲによって生成された高周波成分の時間エンベロープを急峻にする処理を弱めるよう復号器（例えば、音声復号装置２１等）に指示するためのパラメータであってもよく、時間エンベロープを急峻にする処理を実行しないことを示す値を含んでも良い。また、各時間スロットのＫ（ｒ）を伝送せずに、複数の時間スロットに対して代表するＫ（ｒ）を伝送しても良い。同一のＫ（ｒ）の値を共有する時間スロットの区間を決定するためには、ＳＢＲ補助情報に含まれるＳＢＲエンベロープの時間境界（SBR envelope time border）情報を用いることが望ましい。 K (r) is a parameter indicating the strength for adjusting the time envelope of the high frequency component during SBR decoding. The prediction gain for the linear prediction coefficient in the frequency direction increases as the time envelope of the signal in the analysis section shows a sharp change. K (r) is a parameter for instructing the decoder to increase the processing to sharpen the change in the time envelope of the high-frequency component generated by the SBR as the value increases. Note that K (r) is a parameter for instructing the decoder (for example, the speech decoding device 21) to weaken the processing for sharpening the time envelope of the high-frequency component generated by the SBR as the value thereof is smaller. It may be included, and may include a value indicating that the process of making the time envelope steep is not executed. Further, K (r) representing a plurality of time slots may be transmitted without transmitting K (r) of each time slot. In order to determine a time slot interval sharing the same value of K (r), it is desirable to use SBR envelope time border information included in the SBR auxiliary information.

Ｋ（ｒ）は、量子化された後にビットストリーム多重化部１ｇに送信される。量子化の前に複数の時間スロットｒについて例えばＫ（ｒ）の平均をとることにより、複数の時間スロットに対して代表するＫ（ｒ）を計算することが望ましい。また、複数の時間スロットを代表するＫ（ｒ）を伝送する場合には、Ｋ（ｒ）の算出を数式（２）のように個々の時間スロットを分析した結果から独立に行うのではなく、複数の時間スロットからなる区間全体の分析結果からそれらを代表するＫ（ｒ）を取得してもよい。この場合のＫ（ｒ）の算出は例えば次の数式（３）に従って行うことができる。ただし、ｍｅａｎ（・）は、Ｋ（ｒ）によって代表される時間スロットの区間内における平均値を示す。

K (r) is quantized and then transmitted to the bitstream multiplexing unit 1g. It is desirable to calculate a representative K (r) for a plurality of time slots, for example by averaging K (r) for a plurality of time slots r prior to quantization. In addition, when transmitting K (r) representing a plurality of time slots, the calculation of K (r) is not performed independently from the result of analyzing each time slot as in Equation (2). K (r) representing them may be acquired from the analysis result of the entire section composed of a plurality of time slots. In this case, the calculation of K (r) can be performed, for example, according to the following formula (3). Here, mean (•) represents an average value in the section of the time slot represented by K (r).

なお、Ｋ（ｒ）を伝送する際には、“ISO/IEC 14496-3 subpart 4 General Audio Coding”に記載のＳＢＲ補助情報に含まれる逆フィルタモード情報と排他的に伝送しても良い。すなわち、ＳＢＲ補助情報の逆フィルタモード情報を伝送する時間スロットに対してはＫ（ｒ）を伝送せず、Ｋ（ｒ）を伝送する時間スロットに対してはＳＢＲ補助情報の逆フィルタモード情報（“ISO/IEC 14496-3subpart 4 General Audio Coding”におけるbs_invf_mode）を伝送しなくてもよい。なお、Ｋ（ｒ）又はＳＢＲ補助情報に含まれる逆フィルタモード情報のいずれを伝送するかを示す情報を付加してもよい。また、Ｋ（ｒ）とＳＢＲ補助情報に含まれる逆フィルタモード情報とを組み合わせてひとつのベクトル情報として取り扱い、このベクトルをエントロピー符号化してもよい。この際、Ｋ（ｒ）と、ＳＢＲ補助情報に含まれる逆フィルタモード情報との値の組み合わせに制約を加えてもよい。 When transmitting K (r), it may be transmitted exclusively with the inverse filter mode information included in the SBR auxiliary information described in “ISO / IEC 14496-3 subpart 4 General Audio Coding”. That is, K (r) is not transmitted for the time slot for transmitting the inverse filter mode information of the SBR auxiliary information, and the inverse filter mode information of the SBR auxiliary information (for the time slot for transmitting K (r) ( Bs_invf_mode) in “ISO / IEC 14496-3subpart 4 General Audio Coding” may not be transmitted. In addition, you may add the information which shows which of the reverse filter mode information contained in K (r) or SBR auxiliary information is transmitted. Alternatively, K (r) and the inverse filter mode information included in the SBR auxiliary information may be combined and handled as one vector information, and this vector may be entropy encoded. At this time, a restriction may be applied to a combination of values of K (r) and the inverse filter mode information included in the SBR auxiliary information.

ビットストリーム多重化部１ｇは、コアコーデック符号化部１ｃによって算出された符号化ビットストリームと、ＳＢＲ符号化部１ｄによって算出されたＳＢＲ補助情報と、フィルタ強度パラメータ算出部１ｆによって算出されたＫ（ｒ）と、を多重化し、多重化ビットストリーム（符号化された多重化ビットストリーム）を、音声符号化装置１１の通信装置を介して出力する（ステップＳａ７の処理）。 The bitstream multiplexing unit 1g includes the encoded bitstream calculated by the core codec encoding unit 1c, the SBR auxiliary information calculated by the SBR encoding unit 1d, and the K ( r) are multiplexed, and a multiplexed bit stream (encoded multiplexed bit stream) is output via the communication device of the audio encoding device 11 (processing of step Sa7).

図３は、第１の実施形態に係る音声復号装置２１の構成を示す図である。音声復号装置２１は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２１の内蔵メモリに格納された所定のコンピュータプログラム（例えば、図４のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声復号装置２１を統括的に制御する。音声復号装置２１の通信装置は、音声符号化装置１１、後述の変形例１の音声符号化装置１１ａ、又は、後述の変形例２の音声符号化装置から出力される符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。音声復号装置２１は、図３に示すように、機能的には、ビットストリーム分離部２ａ（ビットストリーム分離手段）、コアコーデック復号部２ｂ（コア復号手段）、周波数変換部２ｃ（周波数変換手段）、低周波線形予測分析部２ｄ（低周波時間エンベロープ分析手段）、信号変化検出部２ｅ、フィルタ強度調整部２ｆ（時間エンベロープ調整手段）、高周波生成部２ｇ（高周波生成手段）、高周波線形予測分析部２ｈ、線形予測逆フィルタ部２ｉ、高周波調整部２ｊ（高周波調整手段）、線形予測フィルタ部２ｋ（時間エンベロープ変形手段）、係数加算部２ｍ及び周波数逆変換部２ｎを備える。図３に示す音声復号装置２１のビットストリーム分離部２ａ〜周波数逆変換部２ｎは、音声復号装置２１のＣＰＵが音声復号装置２１の内蔵メモリに格納されたコンピュータプログラムを実行することによって実現される機能である。音声復号装置２１のＣＰＵは、このコンピュータプログラムを実行することによって（図３に示すビットストリーム分離部２ａ〜エンベロープ形状パラメータ算出部１ｎを用いて）、図４のフローチャートに示す処理（ステップＳｂ１〜ステップＳｂ１１の処理）を順次実行する。このコンピュータプログラムの実行に必要な各種データ、及び、このコンピュータプログラムの実行によって生成された各種データは、全て、音声復号装置２１のＲＯＭやＲＡＭ等の内蔵メモリに格納されるものとする。 FIG. 3 is a diagram illustrating the configuration of the speech decoding apparatus 21 according to the first embodiment. The speech decoding device 21 is physically provided with a CPU, ROM, RAM, communication device, etc. (not shown), and this CPU is a predetermined computer program (for example, a diagram) stored in a built-in memory of the speech decoding device 21 such as a ROM. 4 is loaded into the RAM and executed, whereby the speech decoding apparatus 21 is comprehensively controlled. The communication device of the speech decoding device 21 includes encoded multiplexed bits output from the speech encoding device 11, the speech encoding device 11a of Modification 1 described later, or the speech encoding apparatus of Modification 2 described later. The stream is received, and the decoded audio signal is output to the outside. As shown in FIG. 3, the audio decoding device 21 functionally includes a bit stream separation unit 2a (bit stream separation unit), a core codec decoding unit 2b (core decoding unit), and a frequency conversion unit 2c (frequency conversion unit). , Low frequency linear prediction analysis unit 2d (low frequency time envelope analysis unit), signal change detection unit 2e, filter strength adjustment unit 2f (time envelope adjustment unit), high frequency generation unit 2g (high frequency generation unit), high frequency linear prediction analysis unit 2h, a linear prediction inverse filter unit 2i, a high frequency adjustment unit 2j (high frequency adjustment unit), a linear prediction filter unit 2k (time envelope transformation unit), a coefficient addition unit 2m, and a frequency inverse conversion unit 2n. The bit stream separation unit 2a to the inverse frequency conversion unit 2n of the speech decoding device 21 shown in FIG. 3 are realized by the CPU of the speech decoding device 21 executing a computer program stored in the internal memory of the speech decoding device 21. It is a function. The CPU of the speech decoding apparatus 21 executes the computer program (using the bit stream separation unit 2a to the envelope shape parameter calculation unit 1n shown in FIG. 3), thereby performing the processing shown in the flowchart of FIG. Step Sb11) is sequentially executed. It is assumed that various data necessary for the execution of the computer program and various data generated by the execution of the computer program are all stored in a built-in memory such as a ROM or a RAM of the speech decoding device 21.

ビットストリーム分離部２ａは、音声復号装置２１の通信装置を介して入力された多重化ビットストリームを、フィルタ強度パラメータと、ＳＢＲ補助情報と、符号化ビットストリームとに分離する。コアコーデック復号部２ｂは、ビットストリーム分離部２ａから与えられた符号化ビットストリームを復号し、低周波成分のみを含む復号信号を得る（ステップＳｂ１の処理）。この際、復号の方式は、ＣＥＬＰ方式に代表される音声符号化方式に基づいてもよく、またＡＡＣやＴＣＸ（Transform Coded Excitation）方式などの音響符号化に基づいてもよい。 The bit stream separation unit 2a separates the multiplexed bit stream input via the communication device of the audio decoding device 21 into a filter strength parameter, SBR auxiliary information, and an encoded bit stream. The core codec decoding unit 2b decodes the encoded bitstream given from the bitstream separation unit 2a, and obtains a decoded signal including only the low frequency component (processing in step Sb1). At this time, the decoding method may be based on a speech encoding method typified by the CELP method, or may be based on acoustic encoding such as AAC or TCX (Transform Coded Excitation) method.

周波数変換部２ｃは、コアコーデック復号部２ｂから与えられた復号信号を多分割ＱＭＦフィルタバンクにより分析し、ＱＭＦ領域の信号ｑ_ｄｅｃ（ｋ，ｒ）を得る（ステップＳｂ２の処理）。ただし、ｋ（０≦ｋ≦６３）は周波数方向のインデックスであり、ｒはＱＭＦ領域の信号のサブサンプルに関する時間方向のインデックスを示すインデックスである。 The frequency conversion unit 2c analyzes the decoded signal given from the core codec decoding unit 2b using a multi-division QMF filter bank, and obtains a signal q _dec (k, r) in the QMF region (processing in step Sb2). Here, k (0 ≦ k ≦ 63) is an index in the frequency direction, and r is an index indicating an index in the time direction regarding a subsample of a signal in the QMF domain.

低周波線形予測分析部２ｄは、周波数変換部２ｃから得られたｑ_ｄｅｃ（ｋ，ｒ）を時間スロットｒの各々に関して周波数方向に線形予測分析し、低周波線形予測係数ａ_ｄｅｃ（ｎ，ｒ）を取得する（ステップＳｂ３の処理）。線形予測分析は、コアコーデック復号部２ｂから得られた復号信号の信号帯域に対応する０≦ｋ＜ｋ_ｘの範囲に対して行う。また、この線形予測分析は０≦ｋ＜ｋ_ｘの区間に含まれる一部の周波数帯域に対するものであってもよい。 The low frequency linear prediction analysis unit 2d performs linear prediction analysis on the q _dec (k, r) obtained from the frequency conversion unit 2c in the frequency direction with respect to each of the time slots r, and the low frequency linear prediction coefficient a _dec (n, r). ) Is acquired (processing of step Sb3). The linear prediction analysis is performed on the range of 0 ≦ k <k _x corresponding to the signal band of the decoded signal obtained from the core codec decoding unit 2b. Further, this linear prediction analysis may be performed for a part of frequency bands included in a section of 0 ≦ k <k _x .

信号変化検出部２ｅは、周波数変換部２ｃから得られたＱＭＦ領域の信号の時間変化を検出し、検出結果Ｔ（ｒ）として出力する。信号変化の検出は、例えば以下に示す方法によって行うことができる。
１．時間スロットｒにおける信号の短時間電力ｐ（ｒ）を次の数式（４）によって取得する。

２．ｐ（ｒ）を平滑化したエンベロープｐ_ｅｎｖ（ｒ）を次の数式（５）によって取得する。ただしαは０＜α＜１を満たす定数である。

３．ｐ（ｒ）とｐ_ｅｎｖ（ｒ）とを用いてＴ（ｒ）を次の数式（６）に従って取得する。ただしβは定数である。

以上に示した方法は電力の変化に基づく信号変化検出の単純な例であり、他のもっと洗練された方法により信号変化検出を行ってもよい。また、信号変化検出部２ｅは省略してもよい。 The signal change detection unit 2e detects a time change of the signal in the QMF region obtained from the frequency conversion unit 2c, and outputs it as a detection result T (r). The signal change can be detected by, for example, the following method.
1. The short-time power p (r) of the signal in the time slot r is obtained by the following equation (4).

2. An envelope p _env (r) obtained by smoothing p (r) is obtained by the following equation (5). However, α is a constant that satisfies 0 <α <1.

3. T (r) is obtained according to the following equation (6) using p (r) and p _env (r). Where β is a constant.

The method described above is a simple example of signal change detection based on power change, and signal change detection may be performed by another more sophisticated method. Further, the signal change detection unit 2e may be omitted.

フィルタ強度調整部２ｆは、低周波線形予測分析部２ｄから得られたａ_ｄｅｃ（ｎ，ｒ）に対してフィルタ強度の調整を行い、調整された線形予測係数ａ_ａｄｊ（ｎ，ｒ）を得る（ステップＳｂ４の処理）。フィルタ強度の調整は、ビットストリーム分離部２ａを介して受信されたフィルタ強度パラメータＫを用いて、たとえば次の数式（７）に従って行うことができる。

さらに、信号変化検出部２ｅの出力Ｔ（ｒ）が得られる場合には、強度の調整は次の数式（８）に従って行ってもよい。

The filter strength adjustment unit 2f adjusts the filter strength with respect to a _dec (n, r) obtained from the low frequency linear prediction analysis unit 2d to obtain an adjusted linear prediction coefficient a _adj (n, r). (Process of step Sb4). The adjustment of the filter strength can be performed, for example, according to the following formula (7) using the filter strength parameter K received via the bit stream separation unit 2a.

Further, when the output T (r) of the signal change detection unit 2e is obtained, the intensity may be adjusted according to the following formula (8).

高周波生成部２ｇは、周波数変換部２ｃから得られたＱＭＦ領域の信号を低周波帯域から高周波帯域に複写し、高周波成分のＱＭＦ領域の信号ｑ_ｅｘｐ（ｋ，ｒ）を生成する（ステップＳｂ５の処理）。高周波の生成は、“MPEG4 AAC”のＳＢＲにおけるHFgenerationの方法に従って行う（“ISO/IEC 14496-3 subpart 4 General Audio Coding”）。 The high frequency generator 2g copies the QMF domain signal obtained from the frequency converter 2c from the low frequency band to the high frequency band, and generates a QMF domain signal q _exp (k, r) of the high frequency component (in step Sb5). processing). High-frequency generation is performed according to the HF generation method in the SBR of “MPEG4 AAC” (“ISO / IEC 14496-3 subpart 4 General Audio Coding”).

高周波線形予測分析部２ｈは、高周波生成部２ｇによって生成されたｑ_ｅｘｐ（ｋ，ｒ）を時間スロットｒの各々に関して周波数方向に線形予測分析し、高周波線形予測係数ａ_ｅｘｐ（ｎ，ｒ）を取得する（ステップＳｂ６の処理）。線形予測分析は、高周波生成部２ｇによって生成された高周波成分に対応するｋ_ｘ≦ｋ≦６３の範囲に対して行う。 The high-frequency linear prediction analysis unit 2h performs a linear prediction analysis in the frequency direction for q _exp (k, r) generated by the high-frequency generation unit 2g, and calculates a high-frequency linear prediction coefficient a _exp (n, r). Obtain (process of step Sb6). The linear prediction analysis is performed on a range of k _x ≦ k ≦ 63 corresponding to the high frequency component generated by the high frequency generation unit 2g.

線形予測逆フィルタ部２ｉは、高周波生成部２ｇによって生成された高周波帯域のＱＭＦ領域の信号を対象とし、周波数方向にａ_ｅｘｐ（ｎ，ｒ）を係数とする線形予測逆フィルタ処理を行う（ステップＳｂ７の処理）。線形予測逆フィルタの伝達関数は次の数式（９）の通りである。

この線形予測逆フィルタ処理は、低周波側の係数から高周波側の係数に向かって行われてもよいし、その逆でもよい。線形予測逆フィルタ処理は、後段において時間エンベロープ変形を行う前に高周波成分の時間エンベロープを一旦平坦化しておくための処理であり、線形予測逆フィルタ部２ｉは省略されてもよい。また、高周波生成部２ｇからの出力に対して高周波成分への線形予測分析と逆フィルタ処理を行うかわりに、後述する高周波調整部２ｊからの出力に対して高周波線形予測分析部２ｈによる線形予測分析と線形予測逆フィルタ部２ｉによる逆フィルタ処理とを行ってもよい。さらに、線形予測逆フィルタ処理に用いる線形予測係数は、ａ_ｅｘｐ（ｎ，ｒ）ではなく、ａ_ｄｅｃ（ｎ，ｒ）又はａ_ａｄｊ（ｎ，ｒ）であってもよい。また、線形予測逆フィルタ処理に用いられる線形予測係数は、ａ_ｅｘｐ（ｎ，ｒ）に対してフィルタ強度調整を行って取得される線形予測係数ａ_{ｅｘｐ，ａｄｊ}（ｎ，ｒ）であってもよい。強度調整は、ａ_ａｄｊ（ｎ，ｒ）を取得する際と同様、例えば、次の数式（１０）に従って行われる。

The linear prediction inverse filter unit 2i performs a linear prediction inverse filter process on the signal in the high frequency band QMF region generated by the high frequency generation unit 2g and using a _exp (n, r) as a coefficient in the frequency direction (step) Processing of Sb7). The transfer function of the linear prediction inverse filter is as shown in the following equation (9).

This linear prediction inverse filter processing may be performed from the low frequency side coefficient to the high frequency side coefficient, or vice versa. The linear prediction inverse filter process is a process for once flattening the time envelope of the high frequency component before performing the time envelope deformation in the subsequent stage, and the linear prediction inverse filter unit 2i may be omitted. Further, instead of performing linear prediction analysis and inverse filter processing on the high frequency components for the output from the high frequency generation unit 2g, linear prediction analysis by the high frequency linear prediction analysis unit 2h is performed on the output from the high frequency adjustment unit 2j described later. And inverse filter processing by the linear prediction inverse filter unit 2i may be performed. Furthermore, the linear prediction coefficient used for the linear prediction inverse filter processing may be a _dec (n, r) or a _adj (n, r) instead of a _exp (n, r). Also, the linear prediction coefficients used for a linear prediction inverse _{filtering, a} exp _(n, r) linear prediction coefficient is obtained by performing a filtering strength adjustment to _{a exp,} even adj (n, r) Good. The intensity adjustment is performed according to the following formula (10), for example, as in the case of acquiring a _adj (n, r).

高周波調整部２ｊは、線形予測逆フィルタ部２ｉからの出力に対して高周波成分の周波数特性およびトーナリティの調整を行う（ステップＳｂ８の処理）。この調整はビットストリーム分離部２ａから与えられたＳＢＲ補助情報に従って行われる。高周波調整部２ｊによる処理は、“MPEG4 AAC”のＳＢＲにおける“HF adjustment”ステップに従って行われるものであり、高周波帯域のＱＭＦ領域の信号に対し、時間方向の線形予測逆フィルタ処理、ゲインの調整及びノイズの重畳を行うことによる調整である。以上のステップにおける処理の詳細については“ISO/IEC 14496-3subpart 4 General Audio Coding”に詳述されている。なお、上記したように、周波数変換部２ｃ、高周波生成部２ｇ及び高周波調整部２ｊは、全て、“ISO/IEC 14496-3”に規定される“MPEG4 AAC”におけるＳＢＲ復号器に準拠した動作をする。 The high frequency adjustment unit 2j adjusts the frequency characteristic and tonality of the high frequency component for the output from the linear prediction inverse filter unit 2i (processing in step Sb8). This adjustment is performed according to the SBR auxiliary information given from the bitstream separation unit 2a. The processing by the high frequency adjustment unit 2j is performed in accordance with the “HF adjustment” step in the SBR of “MPEG4 AAC”. For the signal in the QMF region of the high frequency band, linear prediction inverse filter processing in the time direction, gain adjustment and This adjustment is performed by superimposing noise. Details of the processing in the above steps are described in detail in “ISO / IEC 14496-3subpart 4 General Audio Coding”. As described above, the frequency conversion unit 2c, the high frequency generation unit 2g, and the high frequency adjustment unit 2j all operate in accordance with the SBR decoder in “MPEG4 AAC” defined in “ISO / IEC 14496-3”. To do.

線形予測フィルタ部２ｋは、高周波調整部２ｊから出力されたＱＭＦ領域の信号の高周波成分ｑ_ａｄｊ（ｎ，ｒ）に対し、フィルタ強度調整部２ｆから得られたａ_ａｄｊ（ｎ，ｒ）を用いて周波数方向に線形予測合成フィルタ処理を行う（ステップＳｂ９の処理）。線形予測合成フィルタ処理における伝達関数は次の数式（１１）の通りである。

この線形予測合成フィルタ処理によって、線形予測フィルタ部２ｋは、ＳＢＲに基づいて生成された高周波成分の時間エンベロープを変形する。 Linear prediction filter unit 2k high-frequency components _q adj (n, r) of the QMF domain signal outputted from the high frequency adjusting unit 2j to, using a filter strength adjusting unit 2f _a obtained from adj (n, r) Then, linear prediction synthesis filter processing is performed in the frequency direction (processing of step Sb9). The transfer function in the linear prediction synthesis filter processing is as shown in the following formula (11).

By this linear prediction synthesis filter processing, the linear prediction filter unit 2k deforms the time envelope of the high-frequency component generated based on SBR.

係数加算部２ｍは、周波数変換部２ｃから出力された低周波成分を含むＱＭＦ領域の信号と、線形予測フィルタ部２ｋから出力された高周波成分を含むＱＭＦ領域の信号とを加算し、低周波成分と高周波成分の双方を含むＱＭＦ領域の信号を出力する（ステップＳｂ１０の処理）。 The coefficient adding unit 2m adds the signal in the QMF region including the low frequency component output from the frequency conversion unit 2c and the signal in the QMF region including the high frequency component output from the linear prediction filter unit 2k, and adds the low frequency component. And a signal in the QMF region including both the high-frequency component (processing in step Sb10).

周波数逆変換部２ｎは、係数加算部２ｍから得られたＱＭＦ領域の信号をＱＭＦ合成フィルタバンクによって処理する。これによって、コアコーデックの復号によって得られた低周波成分と、ＳＢＲによって生成され線形予測フィルタによって時間エンベロープが変形された高周波成分との双方を含む時間領域の復号した音声信号を取得し、この取得した音声信号を、内蔵する通信装置を介して外部に出力する（ステップＳｂ１１の処理）。なお、周波数逆変換部２ｎは、Ｋ（ｒ）と“ISO/IEC 14496-3subpart 4 General Audio Coding”に記載のＳＢＲ補助情報の逆フィルタモード情報とが排他的に伝送された場合、Ｋ（ｒ）が伝送されＳＢＲ補助情報の逆フィルタモード情報の伝送されない時間スロットに対しては、当該時間スロットの前後における時間スロットのうちの少なくとも一つの時間スロットに対するＳＢＲ補助情報の逆フィルタモード情報を用いて、当該時間スロットのＳＢＲ補助情報の逆フィルタモード情報を生成しても良く、当該時間スロットのＳＢＲ補助情報の逆フィルタモード情報をあらかじめ決められた所定のモードに設定しても良い。一方、周波数逆変換部２ｎは、ＳＢＲ補助情報の逆フィルタデータが伝送されＫ（ｒ）の伝送されない時間スロットに対しては、当該時間スロットの前後における時間スロットのうちの少なくとも一つの時間スロットに対するＫ（ｒ）を用いて、当該時間スロットのＫ（ｒ）を生成しても良く、当該時間スロットのＫ（ｒ）を予め決められた所定の値に設定しても良い。なお、周波数逆変換部２ｎは、Ｋ（ｒ）又はＳＢＲ補助情報の逆フィルタモード情報のいずれを伝送したかを示す情報に基づき、伝送された情報が、Ｋ（ｒ）か、ＳＢＲ補助情報の逆フィルタモード情報か、を判断しても良い。 The frequency inverse transformation unit 2n processes the signal in the QMF region obtained from the coefficient addition unit 2m by the QMF synthesis filter bank. As a result, a time-domain decoded speech signal including both the low-frequency component obtained by decoding of the core codec and the high-frequency component generated by SBR and whose time envelope is deformed by the linear prediction filter is obtained and obtained. The voice signal thus output is output to the outside via the built-in communication device (step Sb11 processing). In addition, the frequency inverse transform unit 2n transmits K (r) and K (r) exclusively when the inverse filter mode information of the SBR auxiliary information described in “ISO / IEC 14496-3subpart 4 General Audio Coding” is transmitted. ) Is transmitted and the inverse filter mode information of the SBR auxiliary information is not transmitted, using the inverse filter mode information of the SBR auxiliary information for at least one of the time slots before and after the time slot. The inverse filter mode information of the SBR auxiliary information of the time slot may be generated, or the inverse filter mode information of the SBR auxiliary information of the time slot may be set to a predetermined mode. On the other hand, for the time slot in which the inverse filter data of the SBR auxiliary information is transmitted and K (r) is not transmitted, the frequency inverse transform unit 2n applies to at least one time slot before and after the time slot. Using K (r), K (r) for the time slot may be generated, and K (r) for the time slot may be set to a predetermined value. Note that the frequency inverse transform unit 2n determines whether the transmitted information is K (r) or SBR auxiliary information based on information indicating whether K (r) or the inverse filter mode information of the SBR auxiliary information is transmitted. It may be determined whether it is reverse filter mode information.

(第１の実施形態の変形例１)
図５は、第１の実施形態に係る音声符号化装置の変形例（音声符号化装置１１ａ）の構成を示す図である。音声符号化装置１１ａは、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声符号化装置１１ａの内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって音声符号化装置１１ａを統括的に制御する。音声符号化装置１１ａの通信装置は、符号化の対象となる音声信号を外部から受信し、更に、符号化された多重化ビットストリームを外部に出力する。 (Modification 1 of the first embodiment)
FIG. 5 is a diagram illustrating a configuration of a modified example (speech encoding apparatus 11a) of the speech encoding apparatus according to the first embodiment. The speech encoding device 11a physically includes a CPU, ROM, RAM, and a communication device (not shown). This CPU stores a predetermined computer program stored in the internal memory of the speech encoding device 11a such as a ROM as a RAM. The voice encoding device 11a is comprehensively controlled by loading and executing. The communication device of the audio encoding device 11a receives an audio signal to be encoded from the outside, and further outputs an encoded multiplexed bit stream to the outside.

音声符号化装置１１ａは、図５に示すように、機能的には、音声符号化装置１１の線形予測分析部１ｅ、フィルタ強度パラメータ算出部１ｆ及びビットストリーム多重化部１ｇにかえて、高周波周波数逆変換部１ｈ、短時間電力算出部１ｉ（時間エンベロープ補助情報算出手段）、フィルタ強度パラメータ算出部１ｆ１（時間エンベロープ補助情報算出手段）及びビットストリーム多重化部１ｇ１（ビットストリーム多重化手段）を備える。ビットストリーム多重化部１ｇ１はビットストリーム多重化部１ｇと同様の機能を有する。図５に示す音声符号化装置１１ａの周波数変換部１ａ〜ＳＢＲ符号化部１ｄ、高周波周波数逆変換部１ｈ、短時間電力算出部１ｉ、フィルタ強度パラメータ算出部１ｆ１及びビットストリーム多重化部１ｇ１は、音声符号化装置１１ａのＣＰＵが音声符号化装置１１ａの内蔵メモリに格納されたコンピュータプログラムを実行することによって実現される機能である。このコンピュータプログラムの実行に必要な各種データ、及び、このコンピュータプログラムの実行によって生成された各種データは、全て、音声符号化装置１１ａのＲＯＭやＲＡＭ等の内蔵メモリに格納されるものとする。 As shown in FIG. 5, the speech encoding device 11a functionally replaces the linear prediction analysis unit 1e, the filter strength parameter calculation unit 1f, and the bit stream multiplexing unit 1g of the speech encoding device 11 with a high frequency frequency. An inverse conversion unit 1h, a short-time power calculation unit 1i (time envelope auxiliary information calculation unit), a filter strength parameter calculation unit 1f1 (time envelope auxiliary information calculation unit), and a bit stream multiplexing unit 1g1 (bit stream multiplexing unit) are provided. . The bit stream multiplexing unit 1g1 has the same function as the bit stream multiplexing unit 1g. The frequency conversion unit 1a to SBR encoding unit 1d, the high frequency inverse frequency conversion unit 1h, the short time power calculation unit 1i, the filter strength parameter calculation unit 1f1, and the bit stream multiplexing unit 1g1 of the speech encoding device 11a shown in FIG. This is a function realized by the CPU of the speech encoding device 11a executing a computer program stored in the built-in memory of the speech encoding device 11a. It is assumed that various data necessary for the execution of the computer program and various data generated by the execution of the computer program are all stored in a built-in memory such as a ROM or a RAM of the speech encoding device 11a.

高周波周波数逆変換部１ｈは、周波数変換部１ａから得られたＱＭＦ領域の信号のうち、コアコーデック符号化部１ｃによって符号化される低周波成分に対応する係数を“０”に置き換えた後にＱＭＦ合成フィルタバンクを用いて処理し、高周波成分のみが含まれた時間領域信号を得る。短時間電力算出部１ｉは、高周波周波数逆変換部１ｈから得られた時間領域の高周波成分を短区間に区切ってその電力を算出し、ｐ（ｒ）を算出する。なお、代替的な方法として、ＱＭＦ領域の信号を用いて次の数式（１２）に従って短時間電力を算出してもよい。

The high frequency inverse frequency transform unit 1h replaces the coefficient corresponding to the low frequency component encoded by the core codec encoding unit 1c among the signals in the QMF region obtained from the frequency conversion unit 1a with “0”, and then performs QMF. Processing is performed using the synthesis filter bank to obtain a time-domain signal including only high-frequency components. The short-time power calculation unit 1i calculates the power by dividing the time-domain high-frequency component obtained from the high-frequency inverse frequency conversion unit 1h into short sections, and calculates p (r). As an alternative method, the short-time power may be calculated according to the following equation (12) using a signal in the QMF region.

フィルタ強度パラメータ算出部１ｆ１は、ｐ（ｒ）の変化部分を検出し、変化が大きいほどＫ（ｒ）が大きくなるよう、Ｋ（ｒ）の値を決定する。Ｋ（ｒ）の値は、例えば、音声復号装置２１の信号変化検出部２ｅにおけるＴ（ｒ）の算出と同一の方法で行ってもよい。また、他のもっと洗練された方法により信号変化検出を行ってもよい。また、フィルタ強度パラメータ算出部１ｆ１は、低周波成分と高周波成分の各々について短時間電力を取得した後に音声復号装置２１の信号変化検出部２ｅにおけるＴ（ｒ）の算出と同一の方法によって低周波成分及び高周波成分各々の信号変化Ｔｒ（ｒ）、Ｔｈ（ｒ）を取得し、これらを用いてＫ（ｒ）の値を決定してもよい。この場合、Ｋ（ｒ）は例えば次の数式（１３）に従って取得することができる。ただし、εは、例えば３．０などの定数である。

The filter strength parameter calculation unit 1f1 detects a change portion of p (r), and determines the value of K (r) so that K (r) increases as the change increases. The value of K (r) may be performed, for example, by the same method as the calculation of T (r) in the signal change detection unit 2e of the speech decoding device 21. Further, signal change detection may be performed by other more sophisticated methods. Further, the filter strength parameter calculation unit 1f1 acquires the low frequency by the same method as the calculation of T (r) in the signal change detection unit 2e of the speech decoding apparatus 21 after acquiring the power for a short time for each of the low frequency component and the high frequency component. The signal changes Tr (r) and Th (r) of each of the component and the high frequency component may be acquired and the value of K (r) may be determined using these. In this case, K (r) can be obtained, for example, according to the following formula (13). However, ε is a constant such as 3.0.

(第１の実施形態の変形例２)
第１の実施形態の変形例２の音声符号化装置（不図示）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の変形例２の音声符号化装置の内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって変形例２の音声符号化装置を統括的に制御する。変形例２の音声符号化装置の通信装置は、符号化の対象となる音声信号を外部から受信し、更に、符号化された多重化ビットストリームを外部に出力する。 (Modification 2 of the first embodiment)
A speech encoding apparatus (not shown) of Modification 2 of the first embodiment includes a CPU, a ROM, a RAM, a communication device, and the like which are not physically shown, and this CPU is a speech of Modification 2 such as a ROM. A predetermined computer program stored in the internal memory of the encoding device is loaded into the RAM and executed, whereby the speech encoding device according to the second modification is comprehensively controlled. The communication device of the audio encoding device of Modification 2 receives an audio signal to be encoded from the outside, and further outputs an encoded multiplexed bit stream to the outside.

変形例２の音声符号化装置は、機能的には、音声符号化装置１１のフィルタ強度パラメータ算出部１ｆ及びビットストリーム多重化部１ｇにかえて、図示しない線形予測係数差分符号化部（時間エンベロープ補助情報算出手段）と、この線形予測係数差分符号化部からの出力を受けるビットストリーム多重化部（ビットストリーム多重化手段）とを備える。変形例２の音声符号化装置の周波数変換部１ａ〜線形予測分析部１ｅ、線形予測係数差分符号化部、及び、ビットストリーム多重化部は、変形例２の音声符号化装置のＣＰＵが変形例２の音声符号化装置の内蔵メモリに格納されたコンピュータプログラムを実行することによって実現される機能である。このコンピュータプログラムの実行に必要な各種データ、及び、このコンピュータプログラムの実行によって生成された各種データは、全て、変形例２の音声符号化装置のＲＯＭやＲＡＭ等の内蔵メモリに格納されるものとする。 Functionally, the speech coding apparatus according to the second modified example is replaced with a linear prediction coefficient difference coding unit (time envelope) (not shown) instead of the filter strength parameter calculation unit 1f and the bitstream multiplexing unit 1g of the speech coding device 11. Auxiliary information calculating means) and a bit stream multiplexing section (bit stream multiplexing means) for receiving the output from the linear prediction coefficient difference encoding section. The frequency conversion unit 1a to the linear prediction analysis unit 1e, the linear prediction coefficient difference encoding unit, and the bitstream multiplexing unit of the speech encoding device of Modification 2 are modified by the CPU of the speech encoding device of Modification 2. This is a function realized by executing a computer program stored in the built-in memory of the second speech encoding apparatus. Various data necessary for the execution of the computer program and various data generated by the execution of the computer program are all stored in a built-in memory such as a ROM or a RAM of the speech encoding apparatus according to the second modification. To do.

線形予測係数差分符号化部は、入力信号のａ_Ｈ（ｎ，ｒ）と入力信号のａ_Ｌ（ｎ，ｒ）を用い、次の数式（１４）に従って線形予測係数の差分値ａ_Ｄ（ｎ，ｒ）を算出する。

The linear prediction coefficient difference encoding unit uses the input signal a _H (n, r) and the input signal a _L (n, r), and uses the linear prediction coefficient difference value a _D (n) according to the following equation (14). , R).

線形予測係数差分符号化部は、さらにａ_Ｄ（ｎ，ｒ）を量子化し、ビットストリーム多重化部（ビットストリーム多重化部１ｇに対応する構成）へ送信する。このビットストリーム多重化部は、Ｋ（ｒ）に代わりａ_Ｄ（ｎ，ｒ）をビットストリームに多重化し、この多重化ビットストリームを内蔵する通信装置を介して外部に出力する。 The linear prediction coefficient difference encoding unit further quantizes a _D (n, r) and transmits the quantized bit to the bit stream multiplexing unit (configuration corresponding to the bit stream multiplexing unit 1g). The bit stream multiplexing unit multiplexes a _D (n, r) instead of K (r) into the bit stream, and outputs the multiplexed bit stream to the outside via a communication device incorporating the multiplexed bit stream.

第１の実施形態の変形例２の音声復号装置（不図示）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の変形例２の音声復号装置の内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって変形例２の音声復号装置を統括的に制御する。変形例２の音声復号装置の通信装置は、音声符号化装置１１、変形例１に係る音声符号化装置１１ａ、又は、変形例２に係る音声符号化装置から出力される符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。 The speech decoding apparatus (not shown) of Modification 2 of the first embodiment includes a CPU, a ROM, a RAM, a communication device, and the like that are not physically shown, and this CPU is a speech decoding of Modification 2 of the ROM or the like. A predetermined computer program stored in the built-in memory of the apparatus is loaded into the RAM and executed, whereby the speech decoding apparatus of the modified example 2 is comprehensively controlled. The communication device of the speech decoding apparatus according to the second modification includes the encoded speech output from the speech encoding apparatus 11, the speech encoding apparatus 11a according to the first modification, or the speech encoding apparatus according to the second modification. The bit stream is received, and the decoded audio signal is output to the outside.

変形例２の音声復号装置は、機能的には、音声復号装置２１のフィルタ強度調整部２ｆにかえて、図示しない線形予測係数差分復号部を備える。変形例２の音声復号装置のビットストリーム分離部２ａ〜信号変化検出部２ｅ、線形予測係数差分復号部、及び、高周波生成部２ｇ〜周波数逆変換部２ｎは、変形例２の音声復号装置のＣＰＵが変形例２の音声復号装置の内蔵メモリに格納されたコンピュータプログラムを実行することによって実現される機能である。このコンピュータプログラムの実行に必要な各種データ、及び、このコンピュータプログラムの実行によって生成された各種データは、全て、変形例２の音声復号装置のＲＯＭやＲＡＭ等の内蔵メモリに格納されるものとする。 Functionally, the speech decoding apparatus according to the second modification includes a linear prediction coefficient difference decoding unit (not shown) instead of the filter strength adjustment unit 2f of the speech decoding device 21. The bit stream separation unit 2a to the signal change detection unit 2e, the linear prediction coefficient difference decoding unit, and the high frequency generation unit 2g to the frequency inverse conversion unit 2n of the speech decoding device according to the second modification are the CPUs of the speech decoding device according to the second modification. Is a function realized by executing a computer program stored in the internal memory of the speech decoding apparatus according to the second modification. Various data necessary for the execution of the computer program and various data generated by the execution of the computer program are all stored in a built-in memory such as a ROM or a RAM of the speech decoding apparatus according to the second modification. .

線形予測係数差分復号部は、低周波線形予測分析部２ｄから得られたａ_Ｌ（ｎ，ｒ）とビットストリーム分離部２ａから与えられたａ_Ｄ（ｎ，ｒ）を利用し、次の数式（１５）に従って差分復号されたａ_ａｄｊ（ｎ，ｒ）を得る。

The linear prediction coefficient difference decoding unit uses a _L (n, r) obtained from the low-frequency linear prediction analysis unit 2d and a _D (n, r) given from the bitstream separation unit 2a to obtain the following formula: According to (15), a _adj (n, r) subjected to differential decoding is obtained.

線形予測係数差分復号部は、このようにして差分復号されたａ_ａｄｊ（ｎ，ｒ）を線形予測フィルタ部２ｋに送信する。ａ_Ｄ（ｎ，ｒ）は、数式（１４）に示すように予測係数の領域での差分値であってもよいが、予測係数をＬＳＰ（Linear Spectrum Pair）、ＩＳＰ（ImmittanceSpectrum Pair）、ＬＳＦ（Linear Spectrum Frequency）、ＩＳＦ（Immittance Spectrum Frequency）、ＰＡＲＣＯＲ係数などの別の表現形式に変換した後に差分をとった値であってもよい。この場合、差分復号も同じこの表現形式と同様となる。 The linear prediction coefficient differential decoding unit transmits a _adj (n, r) differentially decoded in this way to the linear prediction filter unit 2k. a _D (n, r) may be a difference value in the prediction coefficient region as shown in Equation (14), but the prediction coefficient is represented by LSP (Linear Spectrum Pair), ISP (ImmittanceSpectrum Pair), LSF ( It may be a value obtained by taking a difference after conversion to another expression format such as Linear Spectrum Frequency (ISF), ISF (Immittance Spectrum Frequency), or PARCOR coefficient. In this case, differential decoding is also the same as this representation format.

（第２の実施形態）
図６は、第２の実施形態に係る音声符号化装置１２の構成を示す図である。音声符号化装置１２は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声符号化装置１２の内蔵メモリに格納された所定のコンピュータプログラム（例えば、図７のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声符号化装置１２を統括的に制御する。音声符号化装置１２の通信装置は、符号化の対象となる音声信号を外部から受信し、更に、符号化された多重化ビットストリームを外部に出力する。 (Second Embodiment)
FIG. 6 is a diagram illustrating the configuration of the speech encoding device 12 according to the second embodiment. The speech encoding device 12 is physically provided with a CPU, ROM, RAM, communication device, etc. (not shown), and this CPU is a predetermined computer program (for example, stored in the internal memory of the speech encoding device 12 such as a ROM). The computer program for performing the processing shown in the flowchart of FIG. 7 is loaded into the RAM and executed to control the speech encoding apparatus 12 in an integrated manner. The communication device of the audio encoding device 12 receives an audio signal to be encoded from the outside, and further outputs an encoded multiplexed bit stream to the outside.

音声符号化装置１２は、機能的には、音声符号化装置１１のフィルタ強度パラメータ算出部１ｆ及びビットストリーム多重化部１ｇにかえて、線形予測係数間引き部１ｊ（予測係数間引き手段）、線形予測係数量子化部１ｋ（予測係数量子化手段）及びビットストリーム多重化部１ｇ２（ビットストリーム多重化手段）を備える。図６に示す音声符号化装置１２の周波数変換部１ａ〜線形予測分析部１ｅ（線形予測分析手段）、線形予測係数間引き部１ｊ、線形予測係数量子化部１ｋ及びビットストリーム多重化部１ｇ２は、音声符号化装置１２のＣＰＵが音声符号化装置１２の内蔵メモリに格納されたコンピュータプログラムを実行することによって実現される機能である。音声符号化装置１２のＣＰＵは、このコンピュータプログラムを実行することによって（図６に示す音声符号化装置１２の周波数変換部１ａ〜線形予測分析部１ｅ、線形予測係数間引き部１ｊ、線形予測係数量子化部１ｋ及びビットストリーム多重化部１ｇ２を用いて）、図７のフローチャートに示す処理（ステップＳａ１〜ステップＳａ５、及び、ステップＳｃ１〜ステップＳｃ３の処理）を順次実行する。このコンピュータプログラムの実行に必要な各種データ、及び、このコンピュータプログラムの実行によって生成された各種データは、全て、音声符号化装置１２のＲＯＭやＲＡＭ等の内蔵メモリに格納されるものとする。 The speech encoding device 12 is functionally replaced by a linear prediction coefficient thinning unit 1j (prediction coefficient thinning means), linear prediction, instead of the filter strength parameter calculation unit 1f and the bitstream multiplexing unit 1g of the speech encoding device 11. A coefficient quantization unit 1k (prediction coefficient quantization unit) and a bit stream multiplexing unit 1g2 (bit stream multiplexing unit) are provided. The frequency conversion unit 1a to the linear prediction analysis unit 1e (linear prediction analysis unit), the linear prediction coefficient thinning unit 1j, the linear prediction coefficient quantization unit 1k, and the bit stream multiplexing unit 1g2 of the speech encoding device 12 illustrated in FIG. This is a function realized by the CPU of the speech encoding device 12 executing a computer program stored in the built-in memory of the speech encoding device 12. The CPU of the speech encoding device 12 executes this computer program (frequency conversion unit 1a to linear prediction analysis unit 1e, linear prediction coefficient thinning unit 1j, linear prediction coefficient quantum of the speech encoding device 12 shown in FIG. 6). 7 (using the conversion unit 1k and the bitstream multiplexing unit 1g2), the processes shown in the flowchart of FIG. 7 (steps Sa1 to Sa5 and steps Sc1 to Sc3) are sequentially executed. It is assumed that various data necessary for the execution of the computer program and various data generated by the execution of the computer program are all stored in a built-in memory such as a ROM or a RAM of the speech encoding device 12.

線形予測係数間引き部１ｊは、線形予測分析部１ｅから得られたａ_Ｈ（ｎ，ｒ）を時間方向に間引き、ａ_Ｈ（ｎ，ｒ）のうち一部の時間スロットｒ_ｉに対する値と、対応するｒ_ｉの値を線形予測係数量子化部１ｋに送信する（ステップＳｃ１の処理）。ただし、０≦ｉ＜Ｎ_ｔｓであり、Ｎ_ｔｓはフレーム中でａ_Ｈ（ｎ，ｒ）の伝送が行われる時間スロットの数である。線形予測係数の間引きは、一定の時間間隔によるものであってもよく、また、ａ_Ｈ（ｎ，ｒ）の性質に基づく不等時間間隔の間引きであってもよい。例えば、ある長さを持つフレームの中でａ_Ｈ（ｎ，ｒ）のＧ_Ｈ（ｒ）を比較し、Ｇ_Ｈ（ｒ）が一定の値を超えた場合にａ_Ｈ（ｎ，ｒ）を量子化の対象とするなどの方法が考えられる。線形予測係数の間引き間隔をａ_Ｈ（ｎ，ｒ）の性質によらず一定の間隔とする場合には、伝送の対象とならない時間スロットに対してはａ_Ｈ（ｎ，ｒ）を算出する必要がない。 The linear prediction coefficient decimation unit 1j decimates a _H (n, r) obtained from the linear prediction analysis unit 1e in the time direction, and a value for a part of time slots r _i in a _H (n, r), The corresponding value of r _i is transmitted to the linear prediction coefficient quantization unit 1k (processing of step Sc1). However, 0 ≦ i <N _ts , where N _ts is the number of time slots in which a _H (n, r) is transmitted in the frame. The thinning out of the linear prediction coefficient may be based on a certain time interval, or may be thinned out based on the property of a _H (n, r). For example, _a H (n, r) in a frame having a length comparing _G H (r) _of, when _{the G} H (r) exceeds a predetermined value _a H (n, r) and A method such as a method for quantization is conceivable. The decimation interval of the linear prediction coefficients a _{H (n,} r) in the case of a constant distance regardless of the nature of, for that do not qualify time slot of transmission necessary to calculate a _{H (n,} r) There is no.

線形予測係数量子化部１ｋは、線形予測係数間引き部１ｊから与えられた間引き後の高周波線形予測係数ａ_Ｈ（ｎ，ｒ_ｉ）と、対応する時間スロットのインデックスｒ_ｉを量子化し、ビットストリーム多重化部１ｇ２に送信する（ステップＳｃ２の処理）。なお、代替的な構成として、ａ_Ｈ（ｎ，ｒ_ｉ）を量子化するかわりに、第１の実施形態の変形例２に係る音声符号化装置と同様に、線形予測係数の差分値ａ_Ｄ（ｎ，ｒ_ｉ）を量子化の対象としてもよい。 The linear prediction coefficient quantization unit 1k quantizes the thinned high-frequency linear prediction coefficient a _H (n, r _i ) given from the linear prediction coefficient thinning unit 1j and the index r _i of the corresponding time slot, and generates a bit stream. The data is transmitted to the multiplexing unit 1g2 (step Sc2 processing). As an alternative configuration, instead of quantizing a _H (n, r _i ), the linear prediction coefficient difference value a _D , as in the speech coding apparatus according to the second modification of the first embodiment. (N, r _i ) may be the target of quantization.

ビットストリーム多重化部１ｇ２は、コアコーデック符号化部１ｃで算出された符号化ビットストリームと、ＳＢＲ符号化部１ｄで算出されたＳＢＲ補助情報と、線形予測係数量子化部１ｋから与えられた量子化後のａ_Ｈ（ｎ，ｒ_ｉ）に対応する時間スロットのインデックス｛ｒ_ｉ｝とをビットストリームに多重化し、この多重化ビットストリームを、音声符号化装置１２の通信装置を介して出力する（ステップＳｃ３の処理）。 The bitstream multiplexing unit 1g2 includes the encoded bitstream calculated by the core codec encoding unit 1c, the SBR auxiliary information calculated by the SBR encoding unit 1d, and the quantum given from the linear prediction coefficient quantization unit 1k. The time slot index {r _i } corresponding to the converted a _H (n, r _i ) is multiplexed into a bit stream, and this multiplexed bit stream is output via the communication device of the speech encoding device 12. (Process of step Sc3).

図８は、第２の実施形態に係る音声復号装置２２の構成を示す図である。音声復号装置２２は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２２の内蔵メモリに格納された所定のコンピュータプログラム（例えば、図９のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声復号装置２２を統括的に制御する。音声復号装置２２の通信装置は、音声符号化装置１２から出力される符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。 FIG. 8 is a diagram illustrating a configuration of the speech decoding apparatus 22 according to the second embodiment. The voice decoding device 22 includes a CPU, a ROM, a RAM, a communication device, and the like which are not physically illustrated. The CPU is a predetermined computer program (for example, a diagram) stored in a built-in memory of the voice decoding device 22 such as a ROM. The speech decoding apparatus 22 is centrally controlled by loading a computer program for performing the processing shown in the flowchart of FIG. The communication device of the audio decoding device 22 receives the encoded multiplexed bit stream output from the audio encoding device 12, and further outputs the decoded audio signal to the outside.

音声復号装置２２は、機能的には、音声復号装置２１のビットストリーム分離部２ａ、低周波線形予測分析部２ｄ、信号変化検出部２ｅ、フィルタ強度調整部２ｆ及び線形予測フィルタ部２ｋにかえて、ビットストリーム分離部２ａ１（ビットストリーム分離手段）、線形予測係数補間・補外部２ｐ（線形予測係数補間・補外手段）及び線形予測フィルタ部２ｋ１（時間エンベロープ変形手段）を備える。図８に示す音声復号装置２２のビットストリーム分離部２ａ１、コアコーデック復号部２ｂ、周波数変換部２ｃ、高周波生成部２ｇ〜高周波調整部２ｊ、線形予測フィルタ部２ｋ１、係数加算部２ｍ、周波数逆変換部２ｎ、及び、線形予測係数補間・補外部２ｐは、音声復号装置２２のＣＰＵが音声復号装置２２の内蔵メモリに格納されたコンピュータプログラムを実行することによって実現される機能である。音声復号装置２２のＣＰＵは、このコンピュータプログラムを実行することによって（図８に示すビットストリーム分離部２ａ１、コアコーデック復号部２ｂ、周波数変換部２ｃ、高周波生成部２ｇ〜高周波調整部２ｊ、線形予測フィルタ部２ｋ１、係数加算部２ｍ、周波数逆変換部２ｎ、及び、線形予測係数補間・補外部２ｐを用いて）、図９のフローチャートに示す処理（ステップＳｂ１〜ステップＳｂ２、ステップＳｄ１、ステップＳｂ５〜ステップＳｂ８、ステップＳｄ２、及び、ステップＳｂ１０〜ステップＳｂ１１の処理）を順次実行する。このコンピュータプログラムの実行に必要な各種データ、及び、このコンピュータプログラムの実行によって生成された各種データは、全て、音声復号装置２２のＲＯＭやＲＡＭ等の内蔵メモリに格納されるものとする。 The speech decoding device 22 is functionally replaced by the bit stream separation unit 2a, the low frequency linear prediction analysis unit 2d, the signal change detection unit 2e, the filter strength adjustment unit 2f, and the linear prediction filter unit 2k of the speech decoding device 21. , A bit stream separation unit 2a1 (bit stream separation unit), a linear prediction coefficient interpolation / extrapolation 2p (linear prediction coefficient interpolation / extrapolation unit), and a linear prediction filter unit 2k1 (time envelope transformation unit). The bit stream separation unit 2a1, the core codec decoding unit 2b, the frequency conversion unit 2c, the high frequency generation unit 2g to the high frequency adjustment unit 2j, the linear prediction filter unit 2k1, the coefficient addition unit 2m, and the frequency inverse conversion of the speech decoding device 22 illustrated in FIG. The unit 2n and the linear prediction coefficient interpolation / external external 2p are functions realized by the CPU of the speech decoding device 22 executing a computer program stored in the internal memory of the speech decoding device 22. The CPU of the speech decoding device 22 executes this computer program (the bit stream separation unit 2a1, the core codec decoding unit 2b, the frequency conversion unit 2c, the high frequency generation unit 2g to the high frequency adjustment unit 2j, and linear prediction shown in FIG. 8). Filter unit 2k1, coefficient adding unit 2m, frequency inverse transform unit 2n, and linear prediction coefficient interpolation / complementary external 2p), processing shown in the flowchart of FIG. 9 (steps Sb1 to Sb2, step Sd1, and step Sb5) Steps Sb8, Sd2, and steps Sb10 to Sb11) are sequentially executed. It is assumed that various data necessary for the execution of the computer program and various data generated by the execution of the computer program are all stored in a built-in memory such as a ROM or a RAM of the speech decoding device 22.

音声復号装置２２は、音声復号装置２２のビットストリーム分離部２ａ、低周波線形予測分析部２ｄ、信号変化検出部２ｅ、フィルタ強度調整部２ｆ及び線形予測フィルタ部２ｋにかえて、ビットストリーム分離部２ａ１、線形予測係数補間・補外部２ｐ及び線形予測フィルタ部２ｋ１を備える。 The speech decoding device 22 replaces the bit stream separation unit 2a, the low frequency linear prediction analysis unit 2d, the signal change detection unit 2e, the filter strength adjustment unit 2f, and the linear prediction filter unit 2k of the speech decoding device 22 with a bit stream separation unit. 2a1, linear prediction coefficient interpolation / external 2p, and linear prediction filter unit 2k1.

ビットストリーム分離部２ａ１は、音声復号装置２２の通信装置を介して入力された多重化ビットストリームを、量子化されたａ_Ｈ（ｎ，ｒ_ｉ）に対応する時間スロットのインデックスｒ_ｉと、ＳＢＲ補助情報と、符号化ビットストリームとに分離する。 The bit stream separation unit 2a1 is configured to quantize the multiplexed bit stream input via the communication device of the audio decoding device 22 with the index r _{i of the} time slot corresponding to the quantized a _H (n, r _i ) and the SBR. The auxiliary information and the encoded bit stream are separated.

線形予測係数補間・補外部２ｐは、量子化されたａ_Ｈ（ｎ，ｒ_ｉ）に対応する時間スロットのインデックスｒ_ｉをビットストリーム分離部２ａ１から受け取り、線形予測係数の伝送されていない時間スロットに対応するａ_Ｈ（ｎ，ｒ）を、補間又は補外により取得する（ステップＳｄ１の処理）。線形予測係数補間・補外部２ｐは、線形予測係数の補外を、例えば次の数式（１６）に従って行うことができる。

ただし、ｒ_ｉ０は線形予測係数が伝送されている時間スロット｛ｒ_ｉ｝のうちｒに最も近いものとする。また、δは０＜δ＜１を満たす定数である。 The linear prediction coefficient interpolation / extrapolation 2p receives the index r _i of the time slot corresponding to the quantized a _H (n, r _i ) from the bitstream separation unit 2a1, and receives the time slot in which no linear prediction coefficient is transmitted. A _H (n, r) corresponding to is acquired by interpolation or extrapolation (processing of step Sd1). The linear prediction coefficient interpolation / extrapolation 2p can perform extrapolation of the linear prediction coefficient, for example, according to the following equation (16).

However, r _i0 is the closest to r of the time the linear prediction coefficients are transmitted slots {r _i}. Also, δ is a constant that satisfies 0 <δ <1.

また、線形予測係数補間・補外部２ｐは、線形予測係数の補間を、例えば次の数式（１７）に従って行うことができる。ただし、ｒ_ｉ０＜r＜ｒ_ｉ０＋１を満たす。

Further, the linear prediction coefficient interpolation / complementary external 2p can perform interpolation of the linear prediction coefficient in accordance with, for example, the following equation (17). However, r _i0 <r <r _{i0 + 1} is satisfied.

なお、線形予測係数補間・補外部２ｐは、線形予測係数をＬＳＰ（LinearSpectrum Pair）、ＩＳＰ（Immittance Spectrum Pair）、ＬＳＦ（Linear Spectrum Frequency）、ＩＳＦ（ImmittanceSpectrum Frequency）、ＰＡＲＣＯＲ係数などの別の表現形式に変換した後に補間・補外し、得られた値を線形予測係数に変換して用いても良い。補間又は補外後のａ_Ｈ（ｎ，ｒ）は線形予測フィルタ部２ｋ１に送信され、線形予測合成フィルタ処理における線形予測係数として利用されるが、線形予測逆フィルタ部２ｉにおける線形予測係数として用いられてもよい。ビットストリームにａ_Ｈ（ｎ，ｒ）ではなくａ_Ｄ（ｎ，ｒ_ｉ）が多重化されている場合、線形予測係数補間・補外部２ｐは、上記の補間又は補外処理に先立ち、第１の実施形態の変形例２に係る音声復号装置と同様の差分復号処理を行う。 In addition, the linear prediction coefficient interpolation / external 2p is another expression format such as LSP (Linear Spectrum Pair), ISP (Immittance Spectrum Pair), LSF (Linear Spectrum Frequency), ISF (Immittance Spectrum Frequency), and PARCOR coefficient. After conversion to, interpolation and extrapolation may be performed, and the obtained value may be converted into a linear prediction coefficient. The interpolated or extrapolated a _H (n, r) is transmitted to the linear prediction filter unit 2k1 and used as a linear prediction coefficient in the linear prediction synthesis filter process, but used as a linear prediction coefficient in the linear prediction inverse filter unit 2i. May be. When a _D (n, r _i ) is multiplexed in the bitstream instead of a _H (n, r), the linear prediction coefficient interpolation / extrapolation 2p performs the first step prior to the above interpolation or extrapolation processing. The same differential decoding process as that of the speech decoding apparatus according to the second modification of the embodiment is performed.

線形予測フィルタ部２ｋ１は、高周波調整部２ｊから出力されたｑ_ａｄｊ（ｎ，ｒ）に対し、線形予測係数補間・補外部２ｐから得られた、補間又は補外されたａ_Ｈ（ｎ，ｒ）を用いて周波数方向に線形予測合成フィルタ処理を行う（ステップＳｄ２の処理）。線形予測フィルタ部２ｋ１の伝達関数は次の数式（１８）の通りである。線形予測フィルタ部２ｋ１は、音声復号装置２１の線形予測フィルタ部２ｋと同様に、線形予測合成フィルタ処理を行うことによって、ＳＢＲにより生成された高周波成分の時間エンベロープを変形する。

The linear prediction filter unit 2k1 interpolates or extrapolates a _H (n, r) obtained from the linear prediction coefficient interpolation / extrapolation 2p with respect to q _adj (n, r) output from the high frequency adjustment unit 2j. ) Is used to perform linear prediction synthesis filter processing in the frequency direction (step Sd2 processing). The transfer function of the linear prediction filter unit 2k1 is as the following formula (18). Similar to the linear prediction filter unit 2k of the speech decoding apparatus 21, the linear prediction filter unit 2k1 performs a linear prediction synthesis filter process to transform the time envelope of the high-frequency component generated by SBR.

（第３の実施形態）
図１０は、第３の実施形態に係る音声符号化装置１３の構成を示す図である。音声符号化装置１３は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声符号化装置１３の内蔵メモリに格納された所定のコンピュータプログラム（例えば、図１１のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声符号化装置１３を統括的に制御する。音声符号化装置１３の通信装置は、符号化の対象となる音声信号を外部から受信し、更に、符号化された多重化ビットストリームを外部に出力する。 (Third embodiment)
FIG. 10 is a diagram illustrating a configuration of the speech encoding device 13 according to the third embodiment. The speech encoding device 13 is physically provided with a CPU, ROM, RAM, communication device, and the like (not shown). The computer program for performing the processing shown in the flowchart of FIG. 11 is loaded into the RAM and executed to control the speech encoding apparatus 13 in an integrated manner. The communication device of the audio encoding device 13 receives an audio signal to be encoded from the outside, and further outputs an encoded multiplexed bit stream to the outside.

音声符号化装置１３は、機能的には、音声符号化装置１１の線形予測分析部１ｅ、フィルタ強度パラメータ算出部１ｆ及びビットストリーム多重化部１ｇにかえて、時間エンベロープ算出部１ｍ（時間エンベロープ補助情報算出手段）、エンベロープ形状パラメータ算出部１ｎ（時間エンベロープ補助情報算出手段）及びビットストリーム多重化部１ｇ３（ビットストリーム多重化手段）を備える。図１０に示す音声符号化装置１３の周波数変換部１ａ〜ＳＢＲ符号化部１ｄ、時間エンベロープ算出部１ｍ、エンベロープ形状パラメータ算出部１ｎ、及び、ビットストリーム多重化部１ｇ３は、音声符号化装置１３のＣＰＵが音声符号化装置１３の内蔵メモリに格納されたコンピュータプログラムを実行することによって実現される機能である。音声符号化装置１３のＣＰＵは、このコンピュータプログラムを実行することによって（図１０に示す音声符号化装置１３の周波数変換部１ａ〜ＳＢＲ符号化部１ｄ、時間エンベロープ算出部１ｍ、エンベロープ形状パラメータ算出部１ｎ、及び、ビットストリーム多重化部１ｇ３を用いて）、図１１のフローチャートに示す処理（ステップＳａ１〜ステップＳａ４、及び、ステップＳｅ１〜ステップＳｅ３の処理）を順次実行する。このコンピュータプログラムの実行に必要な各種データ、及び、このコンピュータプログラムの実行によって生成された各種データは、全て、音声符号化装置１３のＲＯＭやＲＡＭ等の内蔵メモリに格納されるものとする。 The speech encoding device 13 functionally replaces the linear prediction analysis unit 1e, the filter strength parameter calculation unit 1f, and the bit stream multiplexing unit 1g of the speech encoding device 11 in terms of a time envelope calculation unit 1m (time envelope assist). Information calculation unit), an envelope shape parameter calculation unit 1n (temporal envelope auxiliary information calculation unit), and a bit stream multiplexing unit 1g3 (bit stream multiplexing unit). The frequency converters 1a to SBR encoder 1d, the time envelope calculator 1m, the envelope shape parameter calculator 1n, and the bit stream multiplexer 1g3 of the speech encoder 13 shown in FIG. This is a function realized by the CPU executing a computer program stored in the built-in memory of the speech encoding device 13. The CPU of the speech coder 13 executes this computer program (frequency converters 1a to SBR coder 1d, time envelope calculator 1m, envelope shape parameter calculator of the speech coder 13 shown in FIG. 10). 1n and the bit stream multiplexing unit 1g3), the processes shown in the flowchart of FIG. 11 (steps Sa1 to Sa4 and steps Se1 to Se3) are sequentially executed. It is assumed that various data necessary for the execution of the computer program and various data generated by the execution of the computer program are all stored in a built-in memory such as a ROM or a RAM of the speech encoding device 13.

時間エンベロープ算出部１ｍは、ｑ（ｋ，ｒ）を受け取り、例えば、ｑ（ｋ，ｒ）の時間スロットごとの電力を取得することによって、信号の高周波成分の時間エンベロープ情報ｅ（ｒ）を取得する（ステップＳｅ１の処理）。この場合、ｅ（ｒ）は次の数式（１９）に従って取得される。

The time envelope calculation unit 1m receives q (k, r) and acquires time envelope information e (r) of a high frequency component of the signal by acquiring power for each time slot of q (k, r), for example. (Step Se1 processing). In this case, e (r) is obtained according to the following mathematical formula (19).

エンベロープ形状パラメータ算出部１ｎは、時間エンベロープ算出部１ｍからｅ（ｒ）を受け取り、さらにＳＢＲ符号化部１ｄからＳＢＲエンベロープの時間境界｛ｂ_ｉ｝を受け取る。ただし、０≦ｉ≦Ｎｅであり、Ｎｅは符号化フレーム内のＳＢＲエンベロープの数である。エンベロープ形状パラメータ算出部１ｎは、符号化フレーム内のＳＢＲエンベロープの各々について、例えば次の数式（２０）に従ってエンベロープ形状パラメータｓ（ｉ）（０≦ｉ＜Ｎｅ）を取得する（ステップＳｅ２の処理）。なお、エンベロープ形状パラメータｓ（ｉ）は時間エンベロープ補助情報に対応しており、第３の実施形態において同様とする。

ただし、

上記の数式におけるｓ（ｉ）はｂ_ｉ≦ｒ＜ｂ_ｉ＋１を満たすｉ番目のＳＢＲエンベロープ内におけるｅ（ｒ）の変化の大きさを示すパラメータであり、時間エンベロープの変化が大きいほどｅ（ｒ）は大きい値をとる。上記の数式（２０）及び（２１）は、ｓ（ｉ）の算出方法の一例であり、例えばｅ（ｒ）のＳＭＦ（Spectral Flatness Measure）や、最大値と最小値の比等、を用いてｓ（ｉ）を取得してもよい。この後、ｓ（ｉ）は量子化され、ビットストリーム多重化部１ｇ３に伝送される。 The envelope shape parameter calculation unit 1n receives e (r) from the time envelope calculation unit 1m, and further receives a time boundary {b _i } of the SBR envelope from the SBR encoding unit 1d. However, 0 ≦ i ≦ Ne, and Ne is the number of SBR envelopes in the encoded frame. The envelope shape parameter calculation unit 1n acquires the envelope shape parameter s (i) (0 ≦ i <Ne), for example, according to the following equation (20) for each of the SBR envelopes in the encoded frame (processing of step Se2). . Note that the envelope shape parameter s (i) corresponds to the time envelope auxiliary information and is the same in the third embodiment.

However,

In the above formula, s (i) is a parameter indicating the magnitude of change of e (r) in the i-th SBR envelope that satisfies b _i ≦ r <b _{i + 1} , and e (r ) Takes a large value. The above mathematical formulas (20) and (21) are examples of the calculation method of s (i), and for example, using SMF (Spectral Flatness Measure) of e (r), the ratio between the maximum value and the minimum value, and the like. s (i) may be acquired. Thereafter, s (i) is quantized and transmitted to the bitstream multiplexing unit 1g3.

ビットストリーム多重化部１ｇ３は、コアコーデック符号化部１ｃによって算出された符号化ビットストリームと、ＳＢＲ符号化部１ｄによって算出されたＳＢＲ補助情報と、ｓ（ｉ）とをビットストリームに多重化し、この多重化したビットストリームを、音声符号化装置１３の通信装置を介して出力する（ステップＳｅ３の処理）。 The bitstream multiplexing unit 1g3 multiplexes the encoded bitstream calculated by the core codec encoding unit 1c, the SBR auxiliary information calculated by the SBR encoding unit 1d, and s (i) into the bitstream, The multiplexed bit stream is output via the communication device of the speech encoding device 13 (processing of step Se3).

図１２は、第３の実施形態に係る音声復号装置２３の構成を示す図である。音声復号装置２３は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２３の内蔵メモリに格納された所定のコンピュータプログラム（例えば、図１３のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声復号装置２３を統括的に制御する。音声復号装置２３の通信装置は、音声符号化装置１３から出力される符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。 FIG. 12 is a diagram illustrating a configuration of the speech decoding apparatus 23 according to the third embodiment. The speech decoding device 23 includes a CPU, a ROM, a RAM, a communication device, and the like which are not physically illustrated, and this CPU is a predetermined computer program (for example, FIG. The computer program for performing the processing shown in the flowchart in FIG. The communication device of the audio decoding device 23 receives the encoded multiplexed bit stream output from the audio encoding device 13, and further outputs the decoded audio signal to the outside.

音声復号装置２３は、機能的には、音声復号装置２１のビットストリーム分離部２ａ、低周波線形予測分析部２ｄ、信号変化検出部２ｅ、フィルタ強度調整部２ｆ、高周波線形予測分析部２ｈ、線形予測逆フィルタ部２ｉ及び線形予測フィルタ部２ｋにかえて、ビットストリーム分離部２ａ２（ビットストリーム分離手段）、低周波時間エンベロープ算出部２ｒ（低周波時間エンベロープ分析手段）、エンベロープ形状調整部２ｓ（時間エンベロープ調整手段）、高周波時間エンベロープ算出部２ｔ、時間エンベロープ平坦化部２ｕ及び時間エンベロープ変形部２ｖ（時間エンベロープ変形手段）を備える。図１２に示す音声復号装置２３のビットストリーム分離部２ａ２、コアコーデック復号部２ｂ〜周波数変換部２ｃ、高周波生成部２ｇ、高周波調整部２ｊ、係数加算部２ｍ、周波数逆変換部２ｎ、及び、低周波時間エンベロープ算出部２ｒ〜時間エンベロープ変形部２ｖは、音声復号装置２３のＣＰＵが音声復号装置２３の内蔵メモリに格納されたコンピュータプログラムを実行することによって実現される機能である。音声復号装置２３のＣＰＵは、このコンピュータプログラムを実行することによって（図１２に示す音声復号装置２３のビットストリーム分離部２ａ２、コアコーデック復号部２ｂ〜周波数変換部２ｃ、高周波生成部２ｇ、高周波調整部２ｊ、係数加算部２ｍ、周波数逆変換部２ｎ、及び、低周波時間エンベロープ算出部２ｒ〜時間エンベロープ変形部２ｖを用いて）、図１３のフローチャートに示す処理（ステップＳｂ１〜ステップＳｂ２、ステップＳｆ１〜ステップＳｆ２、ステップＳｂ５、ステップＳｆ３〜ステップＳｆ４、ステップＳｂ８、ステップＳｆ５、及び、ステップＳｂ１０〜ステップＳｂ１１の処理）を順次実行する。このコンピュータプログラムの実行に必要な各種データ、及び、このコンピュータプログラムの実行によって生成された各種データは、全て、音声復号装置２３のＲＯＭやＲＡＭ等の内蔵メモリに格納されるものとする。 The speech decoding device 23 functionally includes a bit stream separation unit 2a, a low frequency linear prediction analysis unit 2d, a signal change detection unit 2e, a filter strength adjustment unit 2f, a high frequency linear prediction analysis unit 2h, and a linear function. Instead of the prediction inverse filter unit 2i and the linear prediction filter unit 2k, a bit stream separation unit 2a2 (bit stream separation unit), a low frequency time envelope calculation unit 2r (low frequency time envelope analysis unit), and an envelope shape adjustment unit 2s (time Envelope adjusting means), a high-frequency time envelope calculating section 2t, a time envelope flattening section 2u, and a time envelope deforming section 2v (time envelope deforming means). The bit stream separation unit 2a2, the core codec decoding unit 2b to the frequency conversion unit 2c, the high frequency generation unit 2g, the high frequency adjustment unit 2j, the coefficient addition unit 2m, the frequency inverse conversion unit 2n, and the low The frequency time envelope calculation unit 2r to the time envelope transformation unit 2v are functions realized by the CPU of the speech decoding device 23 executing a computer program stored in the internal memory of the speech decoding device 23. The CPU of the speech decoding device 23 executes this computer program (the bit stream separation unit 2a2, the core codec decoding unit 2b to the frequency conversion unit 2c, the high frequency generation unit 2g, and the high frequency adjustment of the speech decoding device 23 shown in FIG. 12). Unit 2j, coefficient addition unit 2m, frequency inverse conversion unit 2n, and low frequency time envelope calculation unit 2r to time envelope transformation unit 2v), and the processing shown in the flowchart of FIG. 13 (steps Sb1 to Sb2, step Sf1) Step Sf2, Step Sb5, Step Sf3 to Step Sf4, Step Sb8, Step Sf5, and Step Sb10 to Step Sb11) are sequentially executed. It is assumed that various data necessary for the execution of the computer program and various data generated by the execution of the computer program are all stored in a built-in memory such as a ROM or a RAM of the speech decoding device 23.

ビットストリーム分離部２ａ２は、音声復号装置２３の通信装置を介して入力された多重化ビットストリームを、ｓ（ｉ）と、ＳＢＲ補助情報と、符号化ビットストリームとに分離する。低周波時間エンベロープ算出部２ｒは、周波数変換部２ｃから低周波成分を含むｑ_ｄｅｃ（ｋ，ｒ）を受け取り、ｅ（ｒ）を次の数式（２２）に従って取得する（ステップＳｆ１の処理）。

The bit stream separation unit 2a2 separates the multiplexed bit stream input via the communication device of the audio decoding device 23 into s (i), SBR auxiliary information, and an encoded bit stream. The low frequency time envelope calculation unit 2r receives q _dec (k, r) including the low frequency component from the frequency conversion unit 2c, and acquires e (r) according to the following equation (22) (processing of step Sf1).

エンベロープ形状調整部２ｓは、ｓ（ｉ）を用いてｅ（ｒ）を調整し、調整後の時間エンベロープ情報ｅ_ａｄｊ（ｒ）を取得する（ステップＳｆ２の処理）。このｅ（ｒ）に対する調整は、例えば次の数式（２３）〜（２５）に従って行うことができる。

ただし、

である。 The envelope shape adjusting unit 2s adjusts e (r) using s (i), and acquires adjusted time envelope information e _adj (r) (processing in step Sf2). The adjustment to e (r) can be performed, for example, according to the following mathematical formulas (23) to (25).

However,

It is.

上記の数式（２３）〜（２５）は調整方法の一例であり、ｅ_ａｄｊ（ｒ）の形状がｓ（ｉ）によって示される形状に近づくような他の調整方法を用いてもよい。 The above formulas (23) to (25) are examples of adjustment methods, and other adjustment methods may be used such that the shape of e _adj (r) approaches the shape indicated by s (i).

高周波時間エンベロープ算出部２ｔは、高周波生成部２ｇから得られたｑ_ｅｘｐ（ｋ，ｒ）を用いて時間エンベロープｅ_ｅｘｐ（ｒ）を次の数式（２６）に従って算出する（ステップＳｆ３の処理）。

The high frequency time envelope calculation unit 2t calculates the time envelope e _exp (r) according to the following equation (26) using q _exp (k, r) obtained from the high frequency generation unit 2g (processing in step Sf3).

時間エンベロープ平坦化部２ｕは、高周波生成部２ｇから得られたｑ_ｅｘｐ（ｋ，ｒ）の時間エンベロープを次の数式（２７）に従って平坦化し、得られたＱＭＦ領域の信号ｑ_ｆｌａｔ（ｋ，ｒ）を高周波調整部２ｊに送信する（ステップＳｆ４の処理）。

The time envelope flattening unit 2u flattens the time envelope of q _exp (k, r) obtained from the high frequency generation unit 2g according to the following equation (27), and the obtained signal q _flat (k, r) in the QMF region. ) Is transmitted to the high frequency adjustment unit 2j (processing of step Sf4).

時間エンベロープ平坦化部２ｕにおける時間エンベロープの平坦化は省略されてもよい。また、高周波生成部２ｇからの出力に対して、高周波成分の時間エンベロープ算出と時間エンベロープの平坦化処理とを行うかわりに、高周波調整部２ｊからの出力に対して、高周波成分の時間エンベロープ算出と時間エンベロープの平坦化処理とを行ってもよい。さらに、時間エンベロープ平坦化部２ｕにおいて用いる時間エンベロープは、高周波時間エンベロープ算出部２ｔから得られたｅ_ｅｘｐ（ｒ）ではなく、エンベロープ形状調整部２ｓから得られたｅ_ａｄｊ（ｒ）であってもよい。 The flattening of the time envelope in the time envelope flattening unit 2u may be omitted. Further, instead of performing the time envelope calculation of the high frequency component and the flattening process of the time envelope on the output from the high frequency generation unit 2g, the time envelope calculation of the high frequency component is performed on the output from the high frequency adjustment unit 2j. Time envelope flattening processing may be performed. Furthermore, the time envelope used in the time envelope flattening unit 2u is not e _exp (r) obtained from the high frequency time envelope calculating unit 2t, but e _edj (r) obtained from the envelope shape adjusting unit 2s. Good.

時間エンベロープ変形部２ｖは、高周波調整部２ｊから得られたｑ_ａｄｊ（ｋ，ｒ）を時間エンベロープ変形部２ｖから得られたｅ_ａｄｊ（ｒ）を用いて変形し、時間エンベロープが変形されたＱＭＦ領域の信号ｑ_{ｅｎｖａｄｊ}（ｋ，ｒ）を取得する（ステップＳｆ５の処理）。この変形は、次の数式（２８）に従って行われる。ｑ_{ｅｎｖａｄｊ}（ｋ，ｒ）は高周波成分に対応するＱＭＦ領域の信号として係数加算部２ｍに送信される。

The time envelope deformation unit 2v deforms q _adj (k, r) obtained from the high frequency adjustment unit 2j using e _adj (r) obtained from the time envelope deformation unit 2v, and the QMF whose time envelope is deformed. The signal _qenvadj (k, r) of the area is acquired (processing of step Sf5). This deformation is performed according to the following equation (28). _qenvadj (k, r) is transmitted to the coefficient adding unit 2m as a signal in the QMF region corresponding to the high frequency component.

（第４の実施形態）
図１４は、第４の実施形態に係る音声復号装置２４の構成を示す図である。音声復号装置２４は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２４の内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって音声復号装置２４を統括的に制御する。音声復号装置２４の通信装置は、音声符号化装置１１又は音声符号化装置１３から出力される符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。 (Fourth embodiment)
FIG. 14 is a diagram showing the configuration of the speech decoding apparatus 24 according to the fourth embodiment. The voice decoding device 24 is physically provided with a CPU, ROM, RAM, a communication device, etc. (not shown), and this CPU loads a predetermined computer program stored in the internal memory of the voice decoding device 24 such as a ROM into the RAM. The speech decoding device 24 is controlled in an integrated manner. The communication device of the audio decoding device 24 receives the encoded multiplexed bit stream output from the audio encoding device 11 or the audio encoding device 13, and further outputs the decoded audio signal to the outside.

音声復号装置２４は、機能的には、音声復号装置２１の構成（コアコーデック復号部２ｂ、周波数変換部２ｃ、低周波線形予測分析部２ｄ、信号変化検出部２ｅ、フィルタ強度調整部２ｆ、高周波生成部２ｇ、高周波線形予測分析部２ｈ、線形予測逆フィルタ部２ｉ、高周波調整部２ｊ、線形予測フィルタ部２ｋ、係数加算部２ｍ及び周波数逆変換部２ｎ）と、音声復号装置２３の構成（低周波時間エンベロープ算出部２ｒ、エンベロープ形状調整部２ｓ及び時間エンベロープ変形部２ｖ）とを備える。更に、音声復号装置２４は、ビットストリーム分離部２ａ３（ビットストリーム分離手段）及び補助情報変換部２ｗを備える。線形予測フィルタ部２ｋと時間エンベロープ変形部２ｖの順序は図１４に示すものと逆であってもよい。なお、音声復号装置２４は、音声符号化装置１１又は音声符号化装置１３によって符号化されたビットストリームを入力とすることが望ましい。図１４に示す音声復号装置２４の構成は、音声復号装置２４のＣＰＵが音声復号装置２４の内蔵メモリに格納されたコンピュータプログラムを実行することによって実現される機能である。このコンピュータプログラムの実行に必要な各種データ、及び、このコンピュータプログラムの実行によって生成された各種データは、全て、音声復号装置２４のＲＯＭやＲＡＭ等の内蔵メモリに格納されるものとする。 The speech decoding device 24 functionally includes the configuration of the speech decoding device 21 (core codec decoding unit 2b, frequency conversion unit 2c, low frequency linear prediction analysis unit 2d, signal change detection unit 2e, filter strength adjustment unit 2f, high frequency The configuration (low) of the generation unit 2g, the high-frequency linear prediction analysis unit 2h, the linear prediction inverse filter unit 2i, the high-frequency adjustment unit 2j, the linear prediction filter unit 2k, the coefficient addition unit 2m, and the frequency inverse conversion unit 2n) A frequency time envelope calculation unit 2r, an envelope shape adjustment unit 2s, and a time envelope deformation unit 2v). Furthermore, the audio decoding device 24 includes a bit stream separation unit 2a3 (bit stream separation unit) and an auxiliary information conversion unit 2w. The order of the linear prediction filter unit 2k and the time envelope transformation unit 2v may be the reverse of that shown in FIG. Note that the speech decoding device 24 preferably receives a bit stream encoded by the speech encoding device 11 or the speech encoding device 13 as an input. The configuration of the speech decoding device 24 shown in FIG. 14 is a function realized by the CPU of the speech decoding device 24 executing a computer program stored in the built-in memory of the speech decoding device 24. It is assumed that various data necessary for the execution of the computer program and various data generated by the execution of the computer program are all stored in a built-in memory such as a ROM or a RAM of the speech decoding device 24.

ビットストリーム分離部２ａ３は、音声復号装置２４の通信装置を介して入力された多重化ビットストリームを、時間エンベロープ補助情報と、ＳＢＲ補助情報と、符号化ビットストリームとに分離する。時間エンベロープ補助情報は、第１の実施形態において説明したＫ（ｒ）、又は、第３の実施形態において説明したｓ（ｉ）、であってもよい。また、Ｋ（ｒ）、ｓ（ｉ）のいずれでもない他のパラメータＸ（ｒ）であってもよい。 The bit stream separation unit 2a3 separates the multiplexed bit stream input via the communication device of the audio decoding device 24 into time envelope auxiliary information, SBR auxiliary information, and an encoded bit stream. The time envelope auxiliary information may be K (r) described in the first embodiment or s (i) described in the third embodiment. Further, it may be another parameter X (r) that is neither K (r) nor s (i).

補助情報変換部２ｗは、入力された時間エンベロープ補助情報を変換し、Ｋ（ｒ）とｓ（ｉ）とを得る。時間エンベロープ補助情報がＫ（ｒ）の場合、補助情報変換部２ｗは、Ｋ（ｒ）をｓ（ｉ）に変換する。補助情報変換部２ｗは、この変換を、例えばｂ_ｉ≦ｒ＜ｂ_ｉ＋１の区間内でのＫ（ｒ）の平均値

を取得した後に、所定のテーブルを用いて、この数式（２９）に示す平均値をｓ（ｉ）に変換することによって行ってもよい。また、時間エンベロープ補助情報がｓ（ｉ）の場合、補助情報変換部２ｗは、ｓ（ｉ）をＫ（ｒ）に変換する。補助情報変換部２ｗは、この変換を、例えば所定のテーブルを用いてｓ（ｉ）をＫ（ｒ）に変換することによって行ってもよい。ただし、ｉとｒはｂ_ｉ≦ｒ＜ｂ_ｉ＋１の関係を満たすよう対応づけられるものとする。 The auxiliary information conversion unit 2w converts the input time envelope auxiliary information to obtain K (r) and s (i). When the time envelope auxiliary information is K (r), the auxiliary information conversion unit 2w converts K (r) to s (i). The auxiliary information conversion unit 2w performs this conversion, for example, an average value of K (r) in a section of b _i ≦ r <b _{i + 1.}

May be obtained by converting the average value shown in Equation (29) into s (i) using a predetermined table. When the time envelope auxiliary information is s (i), the auxiliary information conversion unit 2w converts s (i) to K (r). The auxiliary information conversion unit 2w may perform this conversion by converting s (i) to K (r) using a predetermined table, for example. However, i and r shall be matched so as to satisfy the relationship of b _i ≦ r <b _{i + 1} .

時間エンベロープ補助情報がｓ（ｉ）でもＫ（ｒ）でもないパラメータＸ（ｒ）の場合、補助情報変換部２ｗは、Ｘ（ｒ）を、Ｋ（ｒ）とｓ（ｉ）とに変換する。補助情報変換部２ｗは、この変換を、例えば所定のテーブルを用いてＸ（ｒ）をＫ（ｒ）およびｓ（ｉ）に変換することによって行うのが望ましい。また、補助情報変換部２ｗは、Ｘ（ｒ）をＳＢＲエンベロープ毎に１つの代表値を伝送するのが望ましい。Ｘ（ｒ）をＫ（ｒ）およびｓ（ｉ）に変換するテーブルは互いに異なっていてもよい。 When the time envelope auxiliary information is a parameter X (r) that is neither s (i) nor K (r), the auxiliary information conversion unit 2w converts X (r) into K (r) and s (i). . The auxiliary information conversion unit 2w desirably performs this conversion by converting X (r) into K (r) and s (i) using a predetermined table, for example. The auxiliary information conversion unit 2w preferably transmits one representative value for each SBR envelope. The tables for converting X (r) into K (r) and s (i) may be different from each other.

（第１の実施形態の変形例３）
第１の実施形態の音声復号装置２１において、音声復号装置２１の線形予測フィルタ部２ｋは、自動利得制御処理を含むことができる。この自動利得制御処理は、線形予測フィルタ部２ｋの出力のＱＭＦ領域の信号の電力を入力されたＱＭＦ領域の信号電力に合わせる処理である。利得制御後のＱＭＦ領域信号ｑ_{ｓｙｎ，ｐｏｗ}（ｎ，ｒ）は、一般的には、次式により実現される。

ここで、Ｐ_０（ｒ），Ｐ_１（ｒ）はそれぞれ以下の数式（３１）及び数式（３２）で表される。

この自動利得制御処理により、線形予測フィルタ部２ｋの出力信号の高周波成分の電力は線形予測フィルタ処理前と等しい値に調整される。その結果、ＳＢＲに基づいて生成された高周波成分の時間エンベロープを変形した線形予測フィルタ部２ｋの出力信号において、高周波調整部２ｊにおいて行われた高周波信号の電力の調整の効果が保たれる。なお、この自動利得制御処理は，ＱＭＦ領域の信号の任意の周波数範囲に対して個別に行うことも可能である。個々の周波数範囲に対する処理は、それぞれ、数式（３０）、数式（３１）、数式（３２）のｎをある周波数範囲に限定することで実現できる。例えばｉ番目の周波数範囲はＦ_ｉ≦ｎ＜Ｆ_ｉ＋１と表すことができる（この場合のｉは、ＱＭＦ領域の信号の任意の周波数範囲の番号を示すインデックスである）。Ｆ_ｉは周波数範囲の境界を示し、“MPEG4 AAC”のＳＢＲにおいて規定されるエンベロープスケールファクタの周波数境界テーブルであることが望ましい。周波数境界テーブルは“MPEG4 AAC”のＳＢＲの規定に従い、高周波生成部２ｇにおいて決定される。この自動利得制御処理により、線形予測フィルタ部２ｋの出力信号の高周波成分の任意の周波数範囲内の電力は線形予測フィルタ処理前と等しい値に調整される。その結果、ＳＢＲに基づいて生成された高周波成分の時間エンベロープを変形した線形予測フィルタ部２ｋの出力信号で、高周波調整部２ｊにおいて行われた高周波信号の電力の調整の効果が周波数範囲の単位で保たれる。また、第１の実施形態の本変形例３と同様の変更を第４の実施形態における線形予測フィルタ部２ｋに加えてもよい。 (Modification 3 of the first embodiment)
In the speech decoding device 21 of the first embodiment, the linear prediction filter unit 2k of the speech decoding device 21 can include an automatic gain control process. This automatic gain control process is a process for matching the power of the QMF domain signal output from the linear prediction filter unit 2k to the input signal power of the QMF domain. The QMF region signal q _{syn, pow} (n, r) after gain control is generally realized by the following equation.

Here, P ₀ (r) and P ₁ (r) are represented by the following formulas (31) and (32), respectively.

By this automatic gain control processing, the power of the high frequency component of the output signal of the linear prediction filter unit 2k is adjusted to a value equal to that before the linear prediction filter processing. As a result, the effect of adjusting the power of the high-frequency signal performed in the high-frequency adjusting unit 2j is maintained in the output signal of the linear prediction filter unit 2k obtained by modifying the time envelope of the high-frequency component generated based on the SBR. Note that this automatic gain control processing can be performed individually for an arbitrary frequency range of a signal in the QMF region. The processing for each frequency range can be realized by limiting n in Equation (30), Equation (31), and Equation (32) to a certain frequency range, respectively. For example, the i-th frequency range can be expressed as F _i ≦ n <F _{i + 1} (where i is an index indicating the number of an arbitrary frequency range of the signal in the QMF domain). F _i represents a frequency range boundary, and is preferably a frequency boundary table of an envelope scale factor defined in the SBR of “MPEG4 AAC”. The frequency boundary table is determined by the high frequency generator 2g in accordance with the SBR specification of “MPEG4 AAC”. By this automatic gain control process, the power within an arbitrary frequency range of the high-frequency component of the output signal of the linear prediction filter unit 2k is adjusted to a value equal to that before the linear prediction filter process. As a result, the effect of adjusting the power of the high-frequency signal performed in the high-frequency adjusting unit 2j in the output signal of the linear prediction filter unit 2k obtained by modifying the time envelope of the high-frequency component generated based on the SBR is in units of frequency range. Kept. Moreover, you may add the change similar to this modification 3 of 1st Embodiment to the linear prediction filter part 2k in 4th Embodiment.

（第３の実施形態の変形例１）
第３の実施形態の音声符号化装置１３におけるエンベロープ形状パラメータ算出部１ｎは、以下のような処理で実現することもできる。エンベロープ形状パラメータ算出部１ｎは、符号化フレーム内のＳＢＲエンベロープの各々について、次の数式（３３）に従ってエンベロープ形状パラメータｓ（ｉ）（０≦ｉ＜Ｎｅ）を取得する。

ただし、

はe（ｒ）のＳＢＲエンベロープ内での平均値であり、その算出方法は数式（２１）に従う。ただし、ＳＢＲエンベロープとは、ｂ_ｉ≦ｒ＜ｂ_ｉ＋１を満たす時間範囲を示す。また、｛ｂ_ｉ｝は、ＳＢＲ補助情報に情報として含まれている、ＳＢＲエンベロープの時間境界であり、任意の時間範囲、任意の周波数範囲の平均信号エネルギーを表すＳＢＲエンベロープスケールファクタが対象とする時間範囲の境界である。また、ｍｉｎ（・）はｂ_ｉ≦ｒ＜ｂ_ｉ＋１の範囲における最小値を表す。従って、この場合には、エンベロープ形状パラメータｓ（ｉ）は、調整後の時間エンベロープ情報のＳＢＲエンベロープ内での最小値と平均値の比率を指示するパラメータである。また、第３の実施形態の音声復号装置２３におけるエンベロープ形状調整部２ｓは、以下のような処理で実現することもできる。エンベロープ形状調整部２ｓは、ｓ（ｉ）を用いてｅ（ｒ）を調整し、調整後の時間エンベロープ情報ｅ_ａｄｊ（ｒ）を取得する。調整の方法は次の数式（３５）又は数式（３６）に従う。

数式３５は、調整後の時間エンベロープ情報ｅ_ａｄｊ（ｒ）のＳＢＲエンベロープ内での最小値と平均値の比率が、エンベロープ形状パラメータｓ（ｉ）の値と等しくなるようエンベロープ形状を調整するものである。また、上記した第３の実施形態の本変形例１と同様の変更を第４の実施形態に加えてもよい。 (Modification 1 of 3rd Embodiment)
The envelope shape parameter calculation unit 1n in the speech encoding device 13 according to the third embodiment can also be realized by the following processing. The envelope shape parameter calculation unit 1n obtains the envelope shape parameter s (i) (0 ≦ i <Ne) for each of the SBR envelopes in the encoded frame according to the following equation (33).

However,

Is the average value of e (r) within the SBR envelope, and the calculation method follows Formula (21). However, the SBR envelope indicates a time range that satisfies b _i ≦ r <b _{i + 1} . {B _i } is a time boundary of the SBR envelope included as information in the SBR auxiliary information, and is targeted for an SBR envelope scale factor representing the average signal energy in an arbitrary time range and an arbitrary frequency range. It is the boundary of the time range. Min (·) represents the minimum value in the range of b _i ≦ r <b _{i + 1} . Therefore, in this case, the envelope shape parameter s (i) is a parameter that indicates the ratio between the minimum value and the average value in the SBR envelope of the adjusted time envelope information. Further, the envelope shape adjusting unit 2s in the speech decoding apparatus 23 according to the third embodiment can also be realized by the following processing. The envelope shape adjusting unit 2s adjusts e (r) using s (i), and obtains time envelope information e _adj (r) after adjustment. The adjustment method follows the following formula (35) or formula (36).

Equation 35 adjusts the envelope shape so that the ratio between the minimum value and the average value in the SBR envelope of the adjusted time envelope information e _adj (r) is equal to the value of the envelope shape parameter s (i). is there. Moreover, you may add the same change as this modification 1 of 3rd Embodiment mentioned above to 4th Embodiment.

（第３の実施形態の変形例２）
時間エンベロープ変形部２ｖは、数式（２８）に代わり、次の数式を利用することもできる。数式（３７）に示すとおり、ｅ_{ａｄｊ，ｓｃａｌｅｄ}（ｒ）は、ｑ_ａｄｊ（ｋ，ｒ）とｑ_{ｅｎｖａｄｊ}（ｋ，ｒ）のＳＢＲエンベロープ内での電力が等しくなるよう調整後の時間エンベロープ情報ｅ_ａｄｊ（ｒ）の利得を制御したものである。また、数式（３８）に示すとおり、第３の実施形態の本変形例２では、ｅ_ａｄｊ（ｒ）ではなくｅ_{ａｄｊ，ｓｃａｌｅｄ}（ｒ）をＱＭＦ領域の信号ｑ_ａｄｊ（ｋ，ｒ）に乗算してｑ_{ｅｎｖａｄｊ}（ｋ，ｒ）を得る。従って、時間エンベロープ変形部２ｖは、ＳＢＲエンベロープ内での信号電力が時間エンベロープの変形の前と後で等しくなるようにＱＭＦ領域の信号ｑ_ａｄｊ（ｋ，ｒ）の時間エンベロープの変形を行うことができる。ただし、ＳＢＲエンベロープとは、ｂ_ｉ≦ｒ＜ｂ_ｉ＋１を満たす時間範囲を示す。また、｛ｂ_ｉ｝は、ＳＢＲ補助情報に情報として含まれている、ＳＢＲエンベロープの時間境界であり、任意の時間範囲、任意の周波数範囲の平均信号エネルギーを表すＳＢＲエンベロープスケールファクタが対象とする時間範囲の境界である。また、本発明の実施例中における用語“ＳＢＲエンベロープ”は、“ISO/IEC 14496-3”に規定される“MPEG4 AAC”における用語“ＳＢＲエンベロープ時間セグメント”に相当し、実施例全体を通して“ＳＢＲエンベロープ”は“ＳＢＲエンベロープ時間セグメント”と同一の内容を意味する。

また、上記した第３の実施形態の本変形例２と同様の変更を第４の実施形態に加えてもよい。 (Modification 2 of the third embodiment)
The time envelope deforming unit 2v can use the following formula instead of the formula (28). As shown in Equation (37), e _{adj, scaled} (r) is the time envelope information e after adjustment so that the power in the SBR envelope of q _adj (k, r) and q _envadj (k, r) are equal. The gain of _adj (r) is controlled. Further, as shown in equation (38), in the second modification of the third _{embodiment, e adj} (r) rather than _{e adj,} multiply _scaled to (r) signal _q adj (k, r) of the QMF region To obtain q _envadj (k, r). Therefore, the time envelope deforming unit 2v can deform the time envelope of the signal _qadj (k, r) in the QMF region so that the signal power in the SBR envelope becomes equal before and after the deformation of the time envelope. it can. However, the SBR envelope indicates a time range that satisfies b _i ≦ r <b _{i + 1} . {B _i } is a time boundary of the SBR envelope included as information in the SBR auxiliary information, and is targeted for an SBR envelope scale factor representing the average signal energy in an arbitrary time range and an arbitrary frequency range. It is the boundary of the time range. The term “SBR envelope” in the embodiments of the present invention corresponds to the term “SBR envelope time segment” in “MPEG4 AAC” defined in “ISO / IEC 14496-3”. “Envelope” means the same content as “SBR envelope time segment”.

Moreover, you may add the change similar to this modification 2 of 3rd Embodiment mentioned above to 4th Embodiment.

（第３の実施形態の変形例３）
数式（１９）は下記の数式（３９）であってもよい。

数式（２２）は下記の数式（４０）であってもよい。

数式（２６）は下記の数式（４１）であってもよい。

数式（３９）及び数式（４０）にしたがった場合、時間エンベロープ情報ｅ（ｒ）は、ＱＭＦサブバンドサンプルごとの電力をＳＢＲエンベロープ内での平均電力で正規化し、さらに平方根をとったものとなる。ただし、ＱＭＦサブバンドサンプルは、ＱＭＦ領域信号において、同一の時間インデックス“ｒ”に対応する信号ベクトルであり、QMF領域における一つのサブサンプルを意味する。また、本発明の実施形態全体において、用語”時間スロット”は”ＱＭＦサブバンドサンプル”と同一の内容を意味する。この場合、時間エンベロープ情報ｅ（ｒ）は、各ＱＭＦサブバンドサンプルへ乗算されるべきゲイン係数を意味することとなり、調整後の時間エンベロープ情報ｅ_ａｄｊ（ｒ）も同様である。 (Modification 3 of the third embodiment)
The mathematical formula (19) may be the following mathematical formula (39).

The mathematical formula (22) may be the following mathematical formula (40).

The mathematical formula (26) may be the following mathematical formula (41).

In accordance with Equation (39) and Equation (40), the time envelope information e (r) is obtained by normalizing the power for each QMF subband sample with the average power in the SBR envelope and taking the square root. . However, the QMF subband sample is a signal vector corresponding to the same time index “r” in the QMF domain signal, and means one subsample in the QMF domain. Also, throughout the embodiments of the present invention, the term “time slot” means the same content as “QMF subband sample”. In this case, the time envelope information e (r) means a gain coefficient to be multiplied to each QMF subband sample, and the adjusted time envelope information e _adj (r) is the same.

（第４の実施形態の変形例１）
第４の実施形態の変形例１の音声復号装置２４ａ（不図示）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２４ａの内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって音声復号装置２４ａを統括的に制御する。音声復号装置２４ａの通信装置は、音声符号化装置１１又は音声符号化装置１３から出力される符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。音声復号装置２４ａは、機能的には、音声復号装置２４のビットストリーム分離部２ａ３に代わり、ビットストリーム分離部２ａ４（不図示）を備え、さらに、補助情報変換部２ｗに代わり、時間エンベロープ補助情報生成部２ｙ（不図示）を備える。ビットストリーム分離部２ａ４は、多重化ビットストリームを、ＳＢＲ補助情報と、符号化ビットストリームとに分離する。時間エンベロープ補助情報生成部２ｙは、符号化ビットストリームおよびＳＢＲ補助情報に含まれる情報に基づいて、時間エンベロープ補助情報を生成する。 (Modification 1 of 4th Embodiment)
A speech decoding device 24a (not shown) of Modification 1 of the fourth embodiment includes a CPU, ROM, RAM, communication device, and the like which are not physically shown. A predetermined computer program stored in the built-in memory is loaded into the RAM and executed, thereby comprehensively controlling the speech decoding device 24a. The communication device of the audio decoding device 24a receives the encoded multiplexed bit stream output from the audio encoding device 11 or the audio encoding device 13, and further outputs the decoded audio signal to the outside. The audio decoding device 24a functionally includes a bit stream separation unit 2a4 (not shown) instead of the bit stream separation unit 2a3 of the audio decoding device 24, and further replaces the auxiliary information conversion unit 2w with time envelope auxiliary information. A generation unit 2y (not shown) is provided. The bit stream separation unit 2a4 separates the multiplexed bit stream into SBR auxiliary information and an encoded bit stream. The time envelope auxiliary information generation unit 2y generates time envelope auxiliary information based on information included in the encoded bitstream and the SBR auxiliary information.

あるＳＢＲエンベロープにおける時間エンベロープ補助情報の生成には、例えば当該ＳＢＲエンベロープの時間幅（ｂ_ｉ＋１−ｂ_ｉ）、フレームクラス、逆フィルタの強度パラメータ、ノイズフロア、高周波電力の大きさ、高周波電力と低周波電力の比率、ＱＭＦ領域で表現された低周波信号を周波数方向に線形予測分析した結果の自己相関係数または予測ゲインなどを用いることができる。これらのパラメータの一つ、または複数の値に基づいてＫ（ｒ）またはｓ（ｉ）を決定することで、時間エンベロープ補助情報を生成することができる。例えばＳＢＲエンベロープの時間幅（ｂ_ｉ＋１−ｂ_ｉ）が広いほどＫ（ｒ）またはｓ（ｉ）が小さくなるよう、またはＳＢＲエンベロープの時間幅（ｂ_ｉ＋１−ｂ_ｉ）が広いほどＫ（ｒ）またはｓ（ｉ）が大きくなるよう（ｂ_ｉ＋１−ｂ_ｉ）に基づいてＫ（ｒ）またはｓ（ｉ）を決定することで、時間エンベロープ補助情報を生成することができる。また、同様の変更を第１の実施形態及び第３の実施形態に加えてもよい。 For generating the time envelope auxiliary information in a certain SBR envelope, for example, the time width (b _{i + 1} −b _i ) of the SBR envelope, the frame class, the strength parameter of the inverse filter, the noise floor, the magnitude of the high frequency power, the high frequency power and the low The ratio of frequency power, the autocorrelation coefficient or the prediction gain as a result of linear prediction analysis of the low frequency signal expressed in the QMF region in the frequency direction can be used. The time envelope auxiliary information can be generated by determining K (r) or s (i) based on one or more values of these parameters. For example the time width of SBR envelopes _{(b _i} + 1 -b _i) larger the K (r) or s (i) such decrease, or SBR envelope time width _{(b _i} + 1 -b _i) larger the K (r) Alternatively, time envelope auxiliary information can be generated by determining K (r) or s (i) based on (b _{i + 1} −b _i ) so that s (i) becomes large. Moreover, you may add the same change to 1st Embodiment and 3rd Embodiment.

（第４の実施形態の変形例２）
第４の実施形態の変形例２の音声復号装置２４ｂ（図１５参照）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２４ｂの内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって音声復号装置２４ｂを統括的に制御する。音声復号装置２４ｂの通信装置は、音声符号化装置１１又は音声符号化装置１３から出力される符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。音声復号装置２４ｂは、図１５に示すとおり、高周波調整部２ｊにかえて、一次高周波調整部２ｊ１と二次高周波調整部２ｊ２とを備える。 (Modification 2 of the fourth embodiment)
The speech decoding device 24b (see FIG. 15) of Modification 2 of the fourth embodiment includes a CPU, a ROM, a RAM, a communication device, and the like which are not physically illustrated, and this CPU is a speech decoding device 24b such as a ROM. A predetermined computer program stored in the built-in memory is loaded into the RAM and executed to control the speech decoding apparatus 24b in an integrated manner. The communication device of the audio decoding device 24b receives the encoded multiplexed bit stream output from the audio encoding device 11 or the audio encoding device 13, and further outputs the decoded audio signal to the outside. As shown in FIG. 15, the speech decoding device 24b includes a primary high frequency adjustment unit 2j1 and a secondary high frequency adjustment unit 2j2 instead of the high frequency adjustment unit 2j.

ここで、一次高周波調整部２ｊ１は、“MPEG4 AAC”のＳＢＲにおける“HF adjustment”ステップにある、高周波帯域のＱＭＦ領域の信号に対する時間方向の線形予測逆フィルタ処理、ゲインの調整及びノイズの重畳処理による調整を行う。このとき、一次高周波調整部２ｊ１の出力信号は、“ISO/IEC14496-3:2005”の“SBR tool”内、4.6.18.7.6節“Assembling HFsignals”の記述内における信号Ｗ_２に相当するものとなる。線形予測フィルタ部２ｋ（又は、線形予測フィルタ部２ｋ１）および時間エンベロープ変形部２ｖは、一次高周波調整部の出力信号を対象に時間エンベロープの変形を行う。二次高周波調整部２ｊ２は、時間エンベロープ変形部２ｖから出力されたＱＭＦ領域の信号に対し、“MPEG4 AAC”のＳＢＲにおける“HF adjustment”ステップにある正弦波の付加処理を行う。二次高周波調整部の処理は、“ISO/IEC14496-3:2005”の“SBR tool”内、4.6.18.7.6節“Assembling HFsignals”の記述内における、信号Ｗ_２から信号Ｙを生成する処理において、信号Ｗ_２を時間エンベロープ変形部２ｖの出力信号に置き換えた処理に相当する。 Here, the primary high frequency adjustment unit 2j1 performs linear prediction inverse filter processing in a time direction, gain adjustment, and noise superimposition processing for a signal in the QMF region of the high frequency band in the “HF adjustment” step in the SBR of “MPEG4 AAC” Make adjustments with. At this time, the output signal of the primary high frequency adjusting unit 2j1 is, "ISO / IEC14496-3: 2005" of "SBR tool" in, equivalent to the signal W ₂ in the description of 4.6.18.7.6 Section "Assembling HFsignals" It becomes. The linear prediction filter unit 2k (or the linear prediction filter unit 2k1) and the time envelope deformation unit 2v perform time envelope deformation on the output signal of the primary high frequency adjustment unit. The secondary high frequency adjustment unit 2j2 performs a sine wave addition process in the “HF adjustment” step in the SBR of “MPEG4 AAC” on the QMF domain signal output from the time envelope transformation unit 2v. Treatment of the secondary high frequency adjusting section, "ISO / IEC14496-3: 2005" in the "SBR tool", within the description of 4.6.18.7.6 Section "Assembling HFsignals", the process of generating a signal Y from the signal W ₂ ₂ corresponds to the processing in which the signal W2 is replaced with the output signal of the time envelope deforming unit 2v.

なお、上記の説明では正弦波付加処理のみを二次高周波調整部２ｊ２の処理としたが、“HF adjustment”ステップにある処理のいずれかを二次高周波調整部２ｊ２の処理としてよい。また、同様な変形は、第１の実施形態、第２の実施形態、第３の実施形態に加えてもよい。この際、第１の実施形態および第２の実施形態は線形予測フィルタ部（線形予測フィルタ部２ｋ，２ｋ１）を備え、時間エンベロープ変形部を備えないため、一次高周波調整部２ｊ１の出力信号に対して線形予測フィルタ部での処理を行った後、線形予測フィルタ部の出力信号を対象に二次高周波調整部２ｊ２での処理を行う。 In the above description, only the sine wave addition process is the process of the secondary high frequency adjustment unit 2j2, but any of the processes in the “HF adjustment” step may be the process of the secondary high frequency adjustment unit 2j2. Moreover, you may add the same deformation | transformation to 1st Embodiment, 2nd Embodiment, and 3rd Embodiment. At this time, since the first embodiment and the second embodiment include the linear prediction filter units (linear prediction filter units 2k and 2k1) and do not include the time envelope deformation unit, the output signal of the primary high frequency adjustment unit 2j1 After the processing in the linear prediction filter unit, the processing in the secondary high frequency adjustment unit 2j2 is performed on the output signal of the linear prediction filter unit.

また、第３の実施形態は時間エンベロープ変形部２ｖを備え、線形予測フィルタ部を備えないため、一次高周波調整部２ｊ１の出力信号に対して時間エンベロープ変形部２ｖでの処理を行った後、時間エンベロープ変形部２ｖの出力信号を対象に二次高周波調整部での処理を行う。 In addition, since the third embodiment includes the time envelope deforming unit 2v and does not include the linear prediction filter unit, the time envelope deforming unit 2v performs processing on the output signal of the primary high frequency adjusting unit 2j1, and then the time The secondary high frequency adjustment unit performs processing on the output signal of the envelope deformation unit 2v.

また、第４の実施形態の音声復号装置（音声復号装置２４，２４ａ，２４ｂ）において、線形予測フィルタ部２ｋと時間エンベロープ変形部２ｖの処理の順序は逆でもよい。すなわち、高周波調整部２ｊまたは一次高周波調整部２ｊ１の出力信号に対して、時間エンベロープ変形部２ｖの処理を先に行い、次に、時間エンベロープ変形部２ｖの出力信号に対して線形予測フィルタ部２ｋの処理を行ってもよい。 Further, in the speech decoding device (speech decoding devices 24, 24a, and 24b) of the fourth embodiment, the order of processing of the linear prediction filter unit 2k and the time envelope transformation unit 2v may be reversed. That is, the processing of the time envelope deforming unit 2v is first performed on the output signal of the high frequency adjusting unit 2j or the primary high frequency adjusting unit 2j1, and then the linear prediction filter unit 2k is output on the output signal of the time envelope deforming unit 2v. You may perform the process of.

また、時間エンベロープ補助情報は線形予測フィルタ部２ｋまたは時間エンベロープ変形部２ｖでの処理を行うか否かを指示する２値の制御情報を含み、この制御情報が線形予測フィルタ部２ｋまたは時間エンベロープ変形部２ｖでの処理を行うことを指示している場合に限って、フィルタ強度パラメータＫ（ｒ）、エンベロープ形状パラメータｓ（ｉ）、またはＫ（ｒ）とｓ（ｉ）の双方を決定するパラメータであるＸ（ｒ）のいずれか一つ以上をさらに情報として含む形式をとってもよい。 The temporal envelope auxiliary information includes binary control information for instructing whether or not to perform processing in the linear prediction filter unit 2k or the temporal envelope transformation unit 2v, and this control information is the linear prediction filter unit 2k or temporal envelope transformation. Only when it is instructed to perform processing in the section 2v, a filter strength parameter K (r), an envelope shape parameter s (i), or a parameter that determines both K (r) and s (i) It may take a form that further includes any one or more of X (r) as information.

（第４の実施形態の変形例３）
第４の実施形態の変形例３の音声復号装置２４ｃ（図１６参照）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２４ｃの内蔵メモリに格納された所定のコンピュータプログラム（例えば、図１７のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声復号装置２４ｃを統括的に制御する。音声復号装置２４ｃの通信装置は、符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。音声復号装置２４ｃは、図１６に示すとおり、高周波調整部２ｊにかえて、一次高周波調整部２ｊ３と二次高周波調整部２ｊ４とを備え、さらに線形予測フィルタ部２ｋと時間エンベロープ変形部２ｖに代えて個別信号成分調整部２ｚ１，２ｚ２，２ｚ３を備える（個別信号成分調整部は、時間エンベロープ変形手段に相当する）。 (Modification 3 of the fourth embodiment)
A speech decoding device 24c (see FIG. 16) of Modification 3 of the fourth embodiment includes a CPU, ROM, RAM, communication device, and the like which are not shown physically, and this CPU is a speech decoding device 24c such as a ROM. A predetermined computer program (for example, a computer program for performing the processing shown in the flowchart of FIG. 17) stored in the built-in memory is loaded into the RAM and executed to control the speech decoding device 24c in an integrated manner. The communication device of the audio decoding device 24c receives the encoded multiplexed bit stream, and further outputs the decoded audio signal to the outside. As shown in FIG. 16, the speech decoding device 24c includes a primary high frequency adjustment unit 2j3 and a secondary high frequency adjustment unit 2j4 in place of the high frequency adjustment unit 2j, and further replaces the linear prediction filter unit 2k and the time envelope modification unit 2v. Individual signal component adjustment units 2z1, 2z2, and 2z3 (the individual signal component adjustment unit corresponds to a time envelope deforming unit).

一次高周波調整部２ｊ３は、高周波帯域のＱＭＦ領域の信号を、複写信号成分として出力する。一次高周波調整部２ｊ３は、高周波帯域のＱＭＦ領域の信号に対して、ビットストリーム分離部２ａ３から与えられるＳＢＲ補助情報を利用して時間方向の線形予測逆フィルタ処理及びゲインの調整（周波数特性の調整）の少なくとも一方を行った信号を複写信号成分として出力してもよい。さらに、一次高周波調整部２ｊ３は、ビットストリーム分離部２ａ３から与えられるＳＢＲ補助情報を利用してノイズ信号成分および正弦波信号成分を生成し、複写信号成分、ノイズ信号成分および正弦波信号成分を分離された形で各々出力する（ステップＳｇ１の処理）。ノイズ信号成分および正弦波信号成分は、ＳＢＲ補助情報の内容に依存し、生成されない場合があってもよい。 The primary high frequency adjustment unit 2j3 outputs a signal in the QMF region in the high frequency band as a copy signal component. The primary high frequency adjustment unit 2j3 uses the SBR auxiliary information provided from the bitstream separation unit 2a3 for the signal in the QMF region in the high frequency band and performs linear prediction inverse filter processing in the time direction and gain adjustment (frequency characteristic adjustment). ) May be output as a copy signal component. Further, the primary high frequency adjustment unit 2j3 generates a noise signal component and a sine wave signal component using the SBR auxiliary information given from the bit stream separation unit 2a3, and separates the copy signal component, the noise signal component, and the sine wave signal component. Each of them is output in the form (process of step Sg1). The noise signal component and the sine wave signal component may depend on the content of the SBR auxiliary information and may not be generated.

個別信号成分調整部２ｚ１，２ｚ２，２ｚ３は、前記一次高周波調整部の出力に含まれる複数の信号成分の各々に対し処理を行う（ステップＳｇ２の処理）。個別信号成分調整部２ｚ１，２ｚ２，２ｚ３における処理は、線形予測フィルタ部２ｋと同様の、フィルタ強度調整部２ｆから得られた線形予測係数を用いた周波数方向の線形予測合成フィルタ処理であってもよい（処理１）。また、個別信号成分調整部２ｚ１，２ｚ２，２ｚ３における処理は、時間エンベロープ変形部２ｖと同様の、エンベロープ形状調整部２ｓから得られた時間エンベロープを用いて各ＱＭＦサブバンドサンプルへゲイン係数を乗算する処理であってもよい（処理２）。また、個別信号成分調整部２ｚ１，２ｚ２，２ｚ３における処理は、入力信号に対して線形予測フィルタ部２ｋと同様の、フィルタ強度調整部２ｆから得られた線形予測係数を用いた周波数方向の線形予測合成フィルタ処理を行った後、その出力信号に対してさらに時間エンベロープ変形部２ｖと同様の、エンベロープ形状調整部２ｓから得られた時間エンベロープを用いて各ＱＭＦサブバンドサンプルへゲイン係数を乗算する処理を行うことであってもよい（処理３）。また、個別信号成分調整部２ｚ１，２ｚ２，２ｚ３における処理は、入力信号に対して時間エンベロープ変形部２ｖと同様の、エンベロープ形状調整部２ｓから得られた時間エンベロープを用いて各ＱＭＦサブバンドサンプルへゲイン係数を乗算する処理を行った後、その出力信号に対してさらに線形予測フィルタ部２ｋと同様の、フィルタ強度調整部２ｆから得られた線形予測係数を用いた周波数方向の線形予測合成フィルタ処理を行うことであってもよい（処理４）。また、個別信号成分調整部２ｚ１，２ｚ２，２ｚ３は入力信号に対して時間エンベロープ変形処理を行わず、入力信号をそのまま出力するものであってもよい（処理５）また、個別信号成分調整部２ｚ１，２ｚ２，２ｚ３における処理は、処理１〜５以外の方法で入力信号の時間エンベロープを変形するための何らかの処理を加えるものであってもよい（処理６）。また、個別信号成分調整部２ｚ１，２ｚ２，２ｚ３における処理は、処理１〜６のうちの複数の処理を任意の順序で組み合わせた処理であってもよい（処理７）。 The individual signal component adjustment units 2z1, 2z2, and 2z3 perform processing on each of the plurality of signal components included in the output of the primary high frequency adjustment unit (processing in step Sg2). The processing in the individual signal component adjustment units 2z1, 2z2, 2z3 may be linear prediction synthesis filter processing in the frequency direction using the linear prediction coefficient obtained from the filter strength adjustment unit 2f, similar to the linear prediction filter unit 2k. Good (processing 1). Further, the processing in the individual signal component adjustment units 2z1, 2z2, and 2z3 is similar to the time envelope deformation unit 2v, and multiplies each QMF subband sample by a gain coefficient using the time envelope obtained from the envelope shape adjustment unit 2s. It may be a process (process 2). Further, the processing in the individual signal component adjustment units 2z1, 2z2, and 2z3 is linear prediction in the frequency direction using the linear prediction coefficient obtained from the filter strength adjustment unit 2f, similar to the linear prediction filter unit 2k, for the input signal. After performing the synthesis filter process, the QMF subband sample is multiplied by a gain coefficient using the time envelope obtained from the envelope shape adjusting unit 2s, similar to the time envelope deforming unit 2v, for the output signal. (Processing 3). The processing in the individual signal component adjustment units 2z1, 2z2, and 2z3 is performed on each QMF subband sample using the time envelope obtained from the envelope shape adjustment unit 2s similar to the time envelope deformation unit 2v for the input signal. After performing the process of multiplying the gain coefficient, the output signal is further subjected to linear prediction synthesis filter processing in the frequency direction using the linear prediction coefficient obtained from the filter strength adjustment unit 2f, similar to the linear prediction filter unit 2k. (Processing 4). The individual signal component adjustment units 2z1, 2z2, and 2z3 may output the input signal as it is without performing the time envelope transformation process on the input signal (processing 5). Also, the individual signal component adjustment unit 2z1 , 2z2, and 2z3 may add some processing for transforming the time envelope of the input signal by a method other than the processing 1 to 5 (processing 6). Further, the process in the individual signal component adjustment units 2z1, 2z2, and 2z3 may be a process in which a plurality of processes among the processes 1 to 6 are combined in an arbitrary order (process 7).

個別信号成分調整部２ｚ１，２ｚ２，２ｚ３における処理は互いに同じでもよいが、個別信号成分調整部２ｚ１，２ｚ２，２ｚ３は、一次高周波調整部の出力に含まれる複数の信号成分の各々に対し互いに異なる方法で時間エンベロープの変形を行ってもよい。例えば個別信号成分調整部２ｚ１は入力された複写信号に対し処理２を行い、個別信号成分調整部２ｚ２は入力されたノイズ信号成分に対して処理３を行い、個別信号成分調整部２ｚ３は入力された正弦波信号に対して処理５を行うといったように、複写信号、ノイズ信号、正弦波信号の各々に対して互いに異なる処理を行ってよい。また、この際、フィルタ強度調整部２ｆとエンベロープ形状調整部２ｓは、個別信号成分調整部２ｚ１，２ｚ２，２ｚ３の各々に対して互いに同じ線形予測係数や時間エンベロープを送信してもよいが、互いに異なる線形予測係数や時間エンベロープを送信してもよく、また個別信号成分調整部２ｚ１，２ｚ２，２ｚ３のいずれか２つ以上に対して同一の線形予測係数や時間エンベロープを送信してもよい。個別信号成分調整部２ｚ１，２ｚ２，２ｚ３の１つ以上は、時間エンベロープ変形処理を行わず、入力信号をそのまま出力するもの（処理５）であってもよいため、個別信号成分調整部２ｚ１，２ｚ２，２ｚ３は全体として、一次高周波調整部２ｊ３から出力された複数の信号成分の少なくとも一つに対し時間エンベロープ処理を行うものである（個別信号成分調整部２ｚ１，２ｚ２，２ｚ３の全てが処理５である場合は、いずれの信号成分に対しても時間エンベロープ変形処理が行われないため、本発明の効果を有さない）。 The processing in the individual signal component adjustment units 2z1, 2z2, and 2z3 may be the same, but the individual signal component adjustment units 2z1, 2z2, and 2z3 are different from each other for each of the plurality of signal components included in the output of the primary high frequency adjustment unit. The time envelope may be modified by the method. For example, the individual signal component adjustment unit 2z1 performs processing 2 on the input copy signal, the individual signal component adjustment unit 2z2 performs processing 3 on the input noise signal component, and the individual signal component adjustment unit 2z3 is input. Different processes may be performed on each of the copy signal, the noise signal, and the sine wave signal, such as performing process 5 on the sine wave signal. At this time, the filter strength adjustment unit 2f and the envelope shape adjustment unit 2s may transmit the same linear prediction coefficient and time envelope to each of the individual signal component adjustment units 2z1, 2z2, and 2z3. Different linear prediction coefficients and time envelopes may be transmitted, and the same linear prediction coefficient and time envelope may be transmitted to any two or more of the individual signal component adjustment units 2z1, 2z2, and 2z3. One or more of the individual signal component adjustment units 2z1, 2z2, and 2z3 may output the input signal as it is without performing the time envelope transformation process (processing 5). Therefore, the individual signal component adjustment units 2z1, 2z2 , 2z3 as a whole performs time envelope processing on at least one of the plurality of signal components output from the primary high frequency adjustment unit 2j3 (all of the individual signal component adjustment units 2z1, 2z2, 2z3 are processing 5). In some cases, the time envelope deformation process is not performed for any signal component, and thus the present invention is not effective.

個別信号成分調整部２ｚ１，２ｚ２，２ｚ３のそれぞれにおける処理は、処理１から処理７のいずれかに固定されていてもよいが、外部から与えられる制御情報に基づいて、処理１から処理７のいずれを行うかが動的に決定されてもよい。この際、上記制御情報は多重化ビットストリームに含まれることが望ましい。また、上記制御情報は、特定のＳＢＲエンベロープ時間セグメント、符号化フレーム、またはその他の時間範囲において処理１から処理７のいずれを行うかを指示するものであってもよく、また、制御の時間範囲を特定せず、処理１から処理７のいずれを行うかを指示するものであってもよい。 The processing in each of the individual signal component adjustment units 2z1, 2z2, and 2z3 may be fixed to any one of the processing 1 to the processing 7, but any one of the processing 1 to the processing 7 is performed based on control information given from the outside. It may be determined dynamically whether or not to perform. At this time, the control information is preferably included in the multiplexed bit stream. Further, the control information may indicate whether to perform the processing 1 to the processing 7 in a specific SBR envelope time segment, an encoded frame, or other time range, and the control time range. The process 1 to the process 7 may be instructed without specifying.

二次高周波調整部２ｊ４は、個別信号成分調整部２ｚ１，２ｚ２，２ｚ３から出力された処理後の信号成分を足し合わせ、係数加算部へ出力する（ステップＳｇ３の処理）。また、二次高周波調整部２ｊ４は、複写信号成分に対して、ビットストリーム分離部２ａ３から与えられるＳＢＲ補助情報を利用して時間方向の線形予測逆フィルタ処理及びゲインの調整（周波数特性の調整）の少なくとも一方を行ってもよい。 The secondary high-frequency adjusting unit 2j4 adds the processed signal components output from the individual signal component adjusting units 2z1, 2z2, and 2z3, and outputs the sum to the coefficient adding unit (processing in step Sg3). Further, the secondary high frequency adjustment unit 2j4 uses the SBR auxiliary information provided from the bit stream separation unit 2a3 for the copy signal component, and performs linear prediction inverse filter processing in the time direction and gain adjustment (frequency characteristic adjustment). You may perform at least one of these.

個別信号成分調整部は２ｚ１，２ｚ２，２ｚ３は互いに協調して動作し、処理１〜７のいずれかの処理を行った後の２つ以上の信号成分を互いに足し合わせ、足し合わされた信号に対してさらに処理１〜７のいずれかの処理を加えて途中段階の出力信号を生成してもよい。この際には、二次高周波調整部２ｊ４は、前記途中段階の出力信号と、前記途中段階の出力信号にまだ足しあわされていない信号成分を足し合わせ、係数加算部へ出力する。具体的には、複写信号成分に処理５を行い、雑音成分に処理１を加えた後にこれら２つの信号成分を互いに足し合わせ、足しあわされた信号に対してさらに処理２を加えて途中段階の出力信号を生成することが望ましい。この際には、二次高周波調整部２ｊ４は、前記途中段階の出力信号に正弦波信号成分を足し合わせ、係数加算部へ出力する。 The individual signal component adjustment units 2z1, 2z2, and 2z3 operate in cooperation with each other, add two or more signal components after performing any one of the processings 1 to 7 to each other, and Further, any one of the processes 1 to 7 may be added to generate an intermediate stage output signal. At this time, the secondary high-frequency adjusting unit 2j4 adds the intermediate stage output signal and the signal component not yet added to the intermediate stage output signal, and outputs the result to the coefficient adding unit. Specifically, the process 5 is performed on the copy signal component, the process 1 is added to the noise component, and then the two signal components are added to each other, and the process 2 is further added to the added signal. It is desirable to generate an output signal. At this time, the secondary high-frequency adjustment unit 2j4 adds the sine wave signal component to the output signal in the middle stage and outputs it to the coefficient addition unit.

一次高周波調整部２ｊ３は、複写信号成分、ノイズ信号成分、正弦波信号成分の３つの信号成分に限らず、任意の複数の信号成分を互いに分離された形で出力してもよい。この場合の信号成分は、複写信号成分、ノイズ信号成分、正弦波信号成分のうち２つ以上を足し合わせたものであってもよい。また、複写信号成分、ノイズ信号成分、正弦波信号成分のいずれかを帯域分割した信号であってもよい。信号成分の数は３以外であってもよく、この場合には個別信号成分調整部の数は３以外であってよい。 The primary high frequency adjustment unit 2j3 is not limited to the three signal components of the copy signal component, the noise signal component, and the sine wave signal component, and may output a plurality of arbitrary signal components in a separated form. In this case, the signal component may be a combination of two or more of a copy signal component, a noise signal component, and a sine wave signal component. Further, it may be a signal obtained by dividing one of a copy signal component, a noise signal component, and a sine wave signal component. The number of signal components may be other than 3, and in this case, the number of individual signal component adjustment units may be other than 3.

ＳＢＲによって生成される高周波信号は、低周波帯域を高周波帯域に複写して得られた複写信号成分と、ノイズ信号、正弦波信号の３つの要素から構成される。複写信号、ノイズ信号、正弦波信号の各々は、互いに異なる時間エンベロープを持つため、本変形例の個別信号成分調整部が行うように、各々の信号成分に対して互いに異なる方法で時間エンベロープの変形を行うことにより、本発明の他の実施例と比較し、復号信号の主観品質をさらに向上させることができる。特に、ノイズ信号は一般に平坦な時間エンベロープを持ち、複写信号は低周波帯域の信号に近い時間エンベロープを持つため、これらを分離して扱い、互いに異なる処理を加えることにより、複写信号とノイズ信号の時間エンベロープを独立に制御することが可能となり、これは復号信号の主観品質向上に有効である。具体的には、ノイズ信号に対しては時間エンベロープを変形させる処理（処理３または処理４）を行い、複写信号に対しては、ノイズ信号に対するものとは異なる処理（処理１または処理２）を行い、さらに、正弦波信号に対しては、処理５を行う（すなわち、時間エンベロープ変形処理を行わない）ことが好ましい。または、ノイズ信号に対しては時間エンベロープの変形処理（処理３または処理４）を行い、複写信号と正弦波信号に対しては、処理５を行う（すなわち、時間エンベロープ変形処理を行わない）ことが好ましい。 The high-frequency signal generated by SBR is composed of three elements: a copy signal component obtained by copying a low-frequency band to a high-frequency band, a noise signal, and a sine wave signal. Since each of the copy signal, the noise signal, and the sine wave signal has a different time envelope, the time envelope is deformed in a different manner for each signal component as the individual signal component adjustment unit of the present modification performs. By performing the above, the subjective quality of the decoded signal can be further improved as compared with the other embodiments of the present invention. In particular, a noise signal generally has a flat time envelope, and a copy signal has a time envelope close to that of a low-frequency band signal. The time envelope can be controlled independently, which is effective in improving the subjective quality of the decoded signal. Specifically, a process (process 3 or process 4) for deforming the time envelope is performed on the noise signal, and a process (process 1 or process 2) different from that for the noise signal is performed on the copy signal. In addition, it is preferable to perform the process 5 on the sine wave signal (that is, do not perform the time envelope deformation process). Alternatively, time envelope deformation processing (processing 3 or processing 4) is performed on noise signals, and processing 5 is performed on copy signals and sine wave signals (that is, time envelope deformation processing is not performed). Is preferred.

（第１の実施形態の変形例４）
第１の実施形態の変形例４の音声符号化装置１１ｂ（図４４）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声符号化装置１１ｂの内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって音声符号化装置１１ｂを統括的に制御する。音声符号化装置１１ｂの通信装置は、符号化の対象となる音声信号を外部から受信し、更に、符号化された多重化ビットストリームを外部に出力する。音声符号化装置１１ｂは、音声符号化装置１１の線形予測分析部１ｅにかえて線形予測分析部１ｅ１を備え、時間スロット選択部１ｐをさらに備える。 (Modification 4 of the first embodiment)
The speech encoding device 11b (FIG. 44) of Modification 4 of the first embodiment includes a CPU, a ROM, a RAM, a communication device, and the like which are not physically illustrated, and this CPU is a speech encoding device such as a ROM. A predetermined computer program stored in the built-in memory 11b is loaded into the RAM and executed to control the speech encoding device 11b in an integrated manner. The communication device of the audio encoding device 11b receives an audio signal to be encoded from the outside, and further outputs an encoded multiplexed bit stream to the outside. The speech encoding device 11b includes a linear prediction analysis unit 1e1 instead of the linear prediction analysis unit 1e of the speech encoding device 11, and further includes a time slot selection unit 1p.

時間スロット選択部１ｐは、周波数変換部１ａからＱＭＦ領域の信号を受け取り、線形予測分析部１ｅ１での線形予測分析処理を施す時間スロットを選択する。線形予測分析部１ｅ１は、時間スロット選択部１ｐより通知された選択結果に基づき、選択された時間スロットのQMF領域信号を線形予測分析部１ｅと同様に線形予測分析し、高周波線形予測係数、低周波線形予測係数のうち少なくともひとつを取得する。フィルタ強度パラメータ算出部１ｆは、線形予測分析部１ｅ１において得られた、時間スロット選択部１ｐで選択された時間スロットの線形予測係数を用いてフィルタ強度パラメータを算出する。時間スロット選択部１ｐでの時間スロットの選択では、例えば後に記載の本変形例の復号装置２１ａにおける時間スロット選択部３ａと同様の高周波成分のＱＭＦ領域信号の信号電力を用いた選択方法のうち少なくともひとつを用いてもよい。その際、時間スロット選択部１ｐにおける高周波成分のＱＭＦ領域信号は、周波数変換部１ａから受け取るＱＭＦ領域の信号のうち、ＳＢＲ符号化部１ｄにおいて符号化される周波数成分であることが望ましい。時間スロットの選択方法は、前記の方法を少なくともひとつ用いてもよく、さらには前記とは異なる方法を少なくともひとつ用いてもよく、さらにはそれらを組み合わせて用いてもよい。 The time slot selection unit 1p receives a signal in the QMF region from the frequency conversion unit 1a, and selects a time slot on which the linear prediction analysis processing in the linear prediction analysis unit 1e1 is performed. Based on the selection result notified from the time slot selection unit 1p, the linear prediction analysis unit 1e1 performs linear prediction analysis on the QMF region signal of the selected time slot in the same manner as the linear prediction analysis unit 1e, and performs a high-frequency linear prediction coefficient, low At least one of the frequency linear prediction coefficients is acquired. The filter strength parameter calculation unit 1f calculates the filter strength parameter using the linear prediction coefficient of the time slot selected by the time slot selection unit 1p obtained by the linear prediction analysis unit 1e1. In the selection of the time slot by the time slot selection unit 1p, for example, at least of the selection methods using the signal power of the QMF domain signal of the high frequency component similar to the time slot selection unit 3a in the decoding device 21a of the present modification described later. One may be used. At this time, the high-frequency component QMF domain signal in the time slot selector 1p is preferably a frequency component encoded by the SBR encoder 1d in the QMF domain signal received from the frequency converter 1a. As the time slot selection method, at least one of the above methods may be used, and at least one method different from the above method may be used, or a combination thereof may be used.

第１の実施形態の変形例４の音声復号装置２１ａ（図１8参照）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２１ａの内蔵メモリに格納された所定のコンピュータプログラム（例えば、図１９のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声復号装置２１ａを統括的に制御する。音声復号装置２１ａの通信装置は、符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。音声復号装置２１ａは、図１８に示すとおり、音声復号装置２１の低周波線形予測分析部２ｄ、信号変化検出部２ｅ、高周波線形予測分析部２ｈ、及び線形予測逆フィルタ部２ｉ、及び線形予測フィルタ部２ｋにかえて、低周波線形予測分析部２ｄ１、信号変化検出部２ｅ１、高周波線形予測分析部２ｈ１、線形予測逆フィルタ部２ｉ１、及び線形予測フィルタ部２ｋ３を備え、時間スロット選択部３ａをさらに備える。 The speech decoding device 21a (see FIG. 18) of Modification 4 of the first embodiment is physically provided with a CPU, ROM, RAM, communication device, etc. (not shown), and this CPU is a speech decoding device 21a such as a ROM. A predetermined computer program (for example, a computer program for performing the processing shown in the flowchart of FIG. 19) stored in the built-in memory is loaded into the RAM and executed, whereby the speech decoding apparatus 21a is controlled in an integrated manner. The communication device of the audio decoding device 21a receives the encoded multiplexed bit stream, and further outputs the decoded audio signal to the outside. As shown in FIG. 18, the speech decoding device 21a includes a low frequency linear prediction analysis unit 2d, a signal change detection unit 2e, a high frequency linear prediction analysis unit 2h, a linear prediction inverse filter unit 2i, and a linear prediction filter. In place of the unit 2k, a low frequency linear prediction analysis unit 2d1, a signal change detection unit 2e1, a high frequency linear prediction analysis unit 2h1, a linear prediction inverse filter unit 2i1, and a linear prediction filter unit 2k3 are provided, and the time slot selection unit 3a is further provided. Prepare.

時間スロット選択部３ａは、高周波生成部２ｇにて生成された時間スロットｒの高周波成分のＱＭＦ領域の信号ｑ_ｅｘｐ（ｋ，ｒ）に対して、線形予測フィルタ部２ｋにおいて線形予測合成フィルタ処理を施すか否かを判断し、線形予測合成フィルタ処理を施す時間スロットを選択する（ステップＳｈ１の処理）。時間スロット選択部３ａは、時間スロットの選択結果を、低周波線形予測分析部２ｄ１、信号変化検出部２ｅ１、高周波線形予測分析部２ｈ１、線形予測逆フィルタ部２ｉ１、線形予測フィルタ部２ｋ３に通知する。低周波線形予測分析部２ｄ１では、時間スロット選択部３ａより通知された選択結果に基づき、選択された時間スロットｒ１のＱＭＦ領域信号を、低周波線形予測分析部２ｄと同様に線形予測分析し、低周波線形予測係数を取得する（ステップＳｈ２の処理）。信号変化検出部２ｅ１では、時間スロット選択部３ａより通知された選択結果に基づき、選択された時間スロットのＱＭＦ領域信号の時間変化を、信号変化検出部２ｅと同様に検出し、検出結果Ｔ（ｒ１）を出力する。 The time slot selection unit 3a performs linear prediction synthesis filter processing in the linear prediction filter unit 2k on the signal q _exp (k, r) of the QMF region of the high frequency component of the time slot r generated by the high frequency generation unit 2g. It is determined whether or not to perform, and a time slot for performing linear prediction synthesis filter processing is selected (processing in step Sh1). The time slot selection unit 3a notifies the selection result of the time slot to the low frequency linear prediction analysis unit 2d1, the signal change detection unit 2e1, the high frequency linear prediction analysis unit 2h1, the linear prediction inverse filter unit 2i1, and the linear prediction filter unit 2k3. . The low frequency linear prediction analysis unit 2d1 performs linear prediction analysis on the QMF region signal of the selected time slot r1 based on the selection result notified from the time slot selection unit 3a in the same manner as the low frequency linear prediction analysis unit 2d. A low frequency linear prediction coefficient is acquired (processing of step Sh2). Based on the selection result notified from the time slot selection unit 3a, the signal change detection unit 2e1 detects the time change of the QMF region signal in the selected time slot in the same manner as the signal change detection unit 2e, and the detection result T ( r1) is output.

フィルタ強度調整部２ｆでは、低周波線形予測分析部２ｄ１において得られた、時間スロット選択部３ａで選択された時間スロットの低周波線形予測係数に対してフィルタ強度調整を行い、調整された線形予測係数ａ_ｄｅｃ（ｎ，ｒ１）を得る。高周波線形予測分析部２ｈ１では、高周波生成部２ｇによって生成された高周波成分のＱＭＦ領域信号を、時間スロット選択部３ａより通知された選択結果に基づき、選択された時間スロットｒ１に関して、高周波線形予測分析部２ｈと同様に、周波数方向に線形予測分析し、高周波線形予測係数ａ_ｅｘｐ（ｎ，ｒ１）を取得する（ステップＳｈ３の処理）。線形予測逆フィルタ部２ｉ１では、時間スロット選択部３ａより通知された選択結果に基づき、選択された時間スロットｒ１の高周波成分のＱＭＦ領域の信号ｑ_ｅｘｐ（ｋ，ｒ）を、線形予測逆フィルタ部２ｉと同様に周波数方向にａ_ｅｘｐ（ｎ，ｒ１）を係数とする線形予測逆フィルタ処理を行う（ステップＳｈ４の処理）。 The filter strength adjustment unit 2f performs filter strength adjustment on the low frequency linear prediction coefficient of the time slot selected by the time slot selection unit 3a obtained by the low frequency linear prediction analysis unit 2d1, and adjusts the linear prediction. The coefficient a _dec (n, r1) is obtained. The high-frequency linear prediction analysis unit 2h1 uses the high-frequency linear prediction analysis for the selected time slot r1 based on the selection result notified from the time slot selection unit 3a based on the QMF region signal of the high frequency component generated by the high frequency generation unit 2g. Similarly to the unit 2h, linear prediction analysis is performed in the frequency direction, and a high-frequency linear prediction coefficient a _exp (n, r1) is acquired (processing in step Sh3). Based on the selection result notified from the time slot selection unit 3a, the linear prediction inverse filter unit 2i1 converts the signal q _exp (k, r) of the high frequency component of the selected time slot r1 into the linear prediction inverse filter unit. Similar to 2i, linear prediction inverse filter processing is performed using a _exp (n, r1) as a coefficient in the frequency direction (processing of step Sh4).

線形予測フィルタ部２ｋ３では、時間スロット選択部３ａより通知された選択結果に基づき、選択された時間スロットｒ１の高周波調整部２ｊから出力された高周波成分のＱＭＦ領域の信号ｑ_ａｄｊ（ｋ，ｒ１）に対し、線形予測フィルタ部２ｋと同様に、フィルタ強度調整部２ｆから得られたａ_ａｄｊ（ｎ，ｒ１）を用いて、周波数方向に線形予測合成フィルタ処理を行う（ステップＳｈ５の処理）。また、変形例３に記載の線形予測フィルタ部２ｋへの変更を、線形予測フィルタ部２ｋ３に加えてもよい。時間スロット選択部３ａでの線形予測合成フィルタ処理を施す時間スロットの選択では、例えば高周波成分のＱＭＦ領域信号ｑ_ｅｘｐ（ｋ，ｒ）の信号電力が所定の値Ｐ_{ｅｘｐ，Ｔｈ}よりも大きい時間スロットｒをひとつ以上選択してもよい。ｑ_ｅｘｐ（ｋ，ｒ）の信号電力は次の数式で求めることが望ましい。

ただし、Ｍは高周波生成部２ｇによって生成される高周波成分の下限周波数ｋ_ｘより高い周波数の範囲を表す値であり、さらには高周波生成部２ｇによって生成される高周波成分の周波数範囲をｋ_ｘ＜＝ｋ＜ｋ_ｘ＋Ｍのように表してもよい。また、所定の値Ｐ_{ｅｘｐ，Ｔｈ}は時間スロットｒを含む所定の時間幅のＰ_ｅｘｐ（ｒ）の平均値でもよい。さらに所定の時間幅はＳＢＲエンベロープでもよい。 In the linear prediction filter unit 2k3, based on the selection result notified from the time slot selection unit 3a, the signal _qadj (k, r1) in the QMF region of the high frequency component output from the high frequency adjustment unit 2j in the selected time slot r1. On the other hand, like the linear prediction filter unit 2k, linear prediction synthesis filter processing is performed in the frequency direction using a _adj (n, r1) obtained from the filter strength adjustment unit 2f (processing of step Sh5). Further, the change to the linear prediction filter unit 2k described in the modification 3 may be added to the linear prediction filter unit 2k3. In the selection of the time slot on which the linear prediction synthesis filter processing is performed in the time slot selection unit 3a, for example, a time slot in which the signal power of the high-frequency component QMF region signal q _exp (k, r) is larger than a predetermined value P _{exp, Th} One or more r may be selected. It is desirable to obtain the signal power of q _exp (k, r) by the following formula.

However, M is a value representing a frequency range higher than the lower limit frequency k _{x of the} high frequency component generated by the high frequency generation unit 2g, and further, the frequency range of the high frequency component generated by the high frequency generation unit 2g is represented by k _x ≦ It may be expressed as k <k _x + M. Further, the predetermined values P _{exp and Th} may be an average value of P _exp (r) having a predetermined time width including the time slot r. Further, the predetermined time width may be an SBR envelope.

また、高周波成分のＱＭＦ領域信号の信号電力がピークになる時間スロットが含まれるように選択してもよい。信号電力のピークは、例えば信号電力の移動平均値

について

が正の値から負の値に変わる時間スロットｒの高周波成分のＱＭＦ領域の信号電力をピークとしてもよい。信号電力の移動平均値

は、例えば次の式で求めることができる。

ただし、ｃは平均値を求める範囲を定める所定の値である。また信号電力のピークは、前記の方法で求めてもよく、異なる方法により求めてもよい。 Further, it may be selected so as to include a time slot in which the signal power of the high-frequency component QMF region signal reaches its peak. The peak of signal power is the moving average value of signal power

about

The signal power in the QMF region of the high-frequency component in the time slot r when the value changes from a positive value to a negative value may be peaked. Moving average value of signal power

Can be obtained by the following equation, for example.

However, c is a predetermined value that defines a range for obtaining an average value. The peak of signal power may be obtained by the above method or may be obtained by a different method.

さらに、高周波成分のＱＭＦ領域信号の信号電力の変動が小さい定常状態から変動の大きい過渡状態になるまでの時間幅tが所定の値ｔ_ｔｈよりも小さく、当該時間幅に含まれる時間スロットを少なくともひとつ選択してもよい。さらに、高周波成分のＱＭＦ領域信号の信号電力の変動が大きい過渡状態から変動の小さい定常状態になるまでの時間幅tが所定の値ｔ_ｔｈよりも小さく、当該時間幅に含まれる時間スロットを少なくともひとつ選択してもよい。｜Ｐ_ｅｘｐ（ｒ＋１）−Ｐ_ｅｘｐ（ｒ）｜が所定の値よりも小さい（または、所定の値と等しいまたは小さい）時間スロットrを前記定常状態とし、｜Ｐ_ｅｘｐ（ｒ＋１）−Ｐ_ｅｘｐ（ｒ）｜が所定の値と等しいまたは大きい（または、所定の値よりも大きい）時間スロットｒを前記過渡状態としてもよく、｜Ｐ_{ｅｘｐ，ＭＡ}（ｒ＋１）−Ｐ_{ｅｘｐ，ＭＡ}（ｒ）｜が所定の値よりも小さい（または、所定の値と等しいまたは小さい）時間スロットｒを前記定常状態とし、｜Ｐ_{ｅｘｐ，ＭＡ}（ｒ＋１）−Ｐ_{ｅｘｐ，ＭＡ}（ｒ）｜が所定の値と等しいまたは大きい（または、所定の値よりも大きい）時間スロットｒを前記過渡状態としてもよい。また過渡状態、定常状態は前記の方法で定義してもよく、異なる方法で定義してもよい。時間スロットの選択方法は、前記の方法を少なくともひとつ用いてもよく、さらには前記とは異なる方法を少なくともひとつ用いてもよく、さらにはそれらを組み合わせても良い。 Furthermore, the time width t from the steady state where the signal power fluctuation of the QMF region signal of the high frequency component is small to the transient state where the fluctuation is large is smaller than a predetermined value t _th , and the time slot included in the time width is at least One may be selected. Furthermore, the time width t from the transient state in which the signal power of the QMF region signal of the high frequency component is large to the steady state in which the variation is small is smaller than a predetermined value t _th, and at least the time slot included in the time width is at least One may be selected. A time slot r in which | P _exp (r + 1) −P _exp (r) | is smaller than (or equal to or smaller than a predetermined value) is set as the steady state, and | P _exp (r + 1) −P _exp ( r) | may be equal to or greater than (or greater than) a predetermined time slot r as the transient state, and | P _{exp, MA} (r + 1) −P _{exp, MA} (r) | A time slot r smaller than a predetermined value (or equal to or smaller than a predetermined value) is set as the steady state, and | P _{exp, MA} (r + 1) −P _{exp, MA} (r) | is equal to a predetermined value or A time slot r that is large (or larger than a predetermined value) may be the transient state. Further, the transient state and the steady state may be defined by the above-described method, or may be defined by different methods. As the time slot selection method, at least one of the above methods may be used, and at least one method different from the above method may be used, or a combination thereof may be used.

（第１の実施形態の変形例５）
第１の実施形態の変形例５の音声符号化装置１１ｃ(図４５)は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声符号化装置１１ｃの内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって音声符号化装置１１ｃを統括的に制御する。音声符号化装置１１ｃの通信装置は、符号化の対象となる音声信号を外部から受信し、更に、符号化された多重化ビットストリームを外部に出力する。音声符号化装置１１ｃは、変形例４の音声符号化装置１１ｂの時間スロット選択部１ｐ、及びビットストリーム多重化部１ｇにかえて、時間スロット選択部１ｐ１、及びビットストリーム多重化部１ｇ４を備える。 (Modification 5 of the first embodiment)
A speech encoding device 11c (FIG. 45) of Modification 5 of the first embodiment includes a CPU, a ROM, a RAM, a communication device, and the like which are not physically illustrated, and this CPU is a speech encoding device such as a ROM. A predetermined computer program stored in the built-in memory 11c is loaded into the RAM and executed, thereby controlling the speech encoding device 11c in an integrated manner. The communication device of the audio encoding device 11c receives an audio signal to be encoded from the outside, and further outputs an encoded multiplexed bit stream to the outside. The speech encoding device 11c includes a time slot selecting unit 1p1 and a bit stream multiplexing unit 1g4 in place of the time slot selecting unit 1p and the bit stream multiplexing unit 1g of the speech encoding device 11b of Modification 4.

時間スロット選択部１ｐ１は、第1の実施形態の変形例４に記載の時間スロット選択部１ｐと同様に時間スロットを選択し、時間スロット選択情報をビットストリーム多重化部１ｇ４へ送る。ビットストリーム多重化部１ｇ４は、コアコーデック符号化部１ｃによって算出された符号化ビットストリームと、ＳＢＲ符号化部１ｄによって算出されたＳＢＲ補助情報と、フィルタ強度パラメータ算出部１ｆによって算出されたフィルタ強度パラメータとを、ビットストリーム多重化部１ｇと同様に多重化し、さらに時間スロット選択部１ｐ１から受け取った時間スロット選択情報とを多重化し、多重化ビットストリームを、音声符号化装置１１ｃの通信装置を介して出力する。前記時間スロット選択情報は、後に記載の音声復号装置２１ｂにおける時間スロット選択部３ａ１が受け取る時間スロット選択情報であり、例えば選択する時間スロットのインデックスｒ１を含んでいてもよい。さらに、例えば時間スロット選択部３ａ１の時間スロット選択方法に利用されるパラメータでもよい。第１の実施形態の変形例５の音声復号装置２１ｂ（図２０参照）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２１ｂの内蔵メモリに格納された所定のコンピュータプログラム（例えば、図２１のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声復号装置２１ｂを統括的に制御する。音声復号装置２１ｂの通信装置は、符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。 The time slot selection unit 1p1 selects a time slot similarly to the time slot selection unit 1p described in the fourth modification of the first embodiment, and sends the time slot selection information to the bit stream multiplexing unit 1g4. The bit stream multiplexing unit 1g4 includes the encoded bit stream calculated by the core codec encoding unit 1c, the SBR auxiliary information calculated by the SBR encoding unit 1d, and the filter strength calculated by the filter strength parameter calculation unit 1f. Are multiplexed with the time slot selection information received from the time slot selection unit 1p1, and the multiplexed bit stream is passed through the communication device of the speech encoding device 11c. Output. The time slot selection information is time slot selection information received by the time slot selection unit 3a1 in the speech decoding device 21b described later, and may include, for example, an index r1 of the time slot to be selected. Furthermore, for example, parameters used in the time slot selection method of the time slot selection unit 3a1 may be used. The speech decoding device 21b (see FIG. 20) of Modification 5 of the first embodiment is physically provided with a CPU, ROM, RAM, communication device, etc. (not shown), and this CPU is a speech decoding device 21b such as a ROM. A predetermined computer program (for example, a computer program for performing the processing shown in the flowchart of FIG. 21) stored in the built-in memory is loaded into the RAM and executed to control the speech decoding apparatus 21b in an integrated manner. The communication device of the audio decoding device 21b receives the encoded multiplexed bit stream, and further outputs the decoded audio signal to the outside.

音声復号装置２１ｂは、図２０に示すとおり、変形例４の音声復号装置２１ａのビットストリーム分離部２ａ、及び時間スロット選択部３ａにかえて、ビットストリーム分離部２ａ５、及び時間スロット選択部３ａ１を備え、時間スロット選択部３ａ１に時間スロット選択情報が入力される。ビットストリーム分離部２ａ５では、多重化ビットストリームを、ビットストリーム分離部２ａと同様に、フィルタ強度パラメータと、ＳＢＲ補助情報と、符号化ビットストリームとに分離し、時間スロット選択情報をさらに分離する。時間スロット選択部３ａ１では、ビットストリーム分離部２ａ５から送られた時間スロット選択情報に基づいて時間スロットを選択する（ステップＳｉ１の処理）。時間スロット選択情報は、時間スロットの選択に用いる情報であり、例えば選択する時間スロットのインデックスｒ１を含んでいてもよい。さらに、例えば変形例4に記載の時間スロット選択方法に利用されるパラメータでもよい。この場合、時間スロット選択部３ａ１には、時間スロット選択情報に加えて、図示されていないが高周波生成部２ｇにて生成された高周波成分のＱＭＦ領域信号も入力される。前記パラメータは、例えば前記時間スロットの選択のために用いる所定の値（例えば、Ｐ_{ｅｘｐ，Ｔｈ}、ｔ_Ｔｈなど）でもよい。 As shown in FIG. 20, the speech decoding device 21b replaces the bit stream separation unit 2a and the time slot selection unit 3a of the speech decoding device 21a of the fourth modification with a bit stream separation unit 2a5 and a time slot selection unit 3a1. The time slot selection information is input to the time slot selection unit 3a1. Similarly to the bit stream separation unit 2a, the bit stream separation unit 2a5 separates the multiplexed bit stream into filter strength parameters, SBR auxiliary information, and encoded bit stream, and further separates time slot selection information. The time slot selection unit 3a1 selects a time slot based on the time slot selection information sent from the bitstream separation unit 2a5 (processing in step Si1). The time slot selection information is information used for time slot selection, and may include, for example, an index r1 of the time slot to be selected. Further, for example, parameters used in the time slot selection method described in the fourth modification may be used. In this case, in addition to the time slot selection information, a high frequency component QMF region signal generated by the high frequency generation unit 2g is also input to the time slot selection unit 3a1. The parameter may be a predetermined value (for example, P _{exp, Th} , t _Th, etc.) used for selecting the time slot.

（第１の実施形態の変形例６）
第１の実施形態の変形例６の音声符号化装置１１ｄ（不図示）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声符号化装置１１ｄの内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって音声符号化装置１１ｄを統括的に制御する。音声符号化装置１１ｄの通信装置は、符号化の対象となる音声信号を外部から受信し、更に、符号化された多重化ビットストリームを外部に出力する。音声符号化装置１１ｄは、変形例１の音声符号化装置１１ａの短時間電力算出部１ｉにかえて、図示しない短時間電力算出部１ｉ１を備え、時間スロット選択部１ｐ２をさらに備える。 (Modification 6 of the first embodiment)
A speech encoding device 11d (not shown) of Modification 6 of the first embodiment includes a CPU, a ROM, a RAM, a communication device, and the like which are not physically illustrated, and this CPU is a speech encoding device such as a ROM. A predetermined computer program stored in the built-in memory 11d is loaded into the RAM and executed to control the speech encoding device 11d in an integrated manner. The communication device of the audio encoding device 11d receives an audio signal to be encoded from the outside, and further outputs an encoded multiplexed bit stream to the outside. The speech encoding device 11d includes a short-time power calculation unit 1i1 (not shown) and a time slot selection unit 1p2 instead of the short-time power calculation unit 1i of the speech encoding device 11a of the first modification.

時間スロット選択部１ｐ２は、周波数変換部１ａからＱＭＦ領域の信号を受け取り、短時間電力算出部１ｉでの短時間電力算出処理を施す時間区間に対応する時間スロットを選択する。短時間電力算出部１ｉ１は、時間スロット選択部１ｐ２より通知された選択結果に基づき、選択された時間スロットに対応する時間区間の短時間電力を、変形例１の音声符号化装置１１ａの短時間電力算出部１ｉと同様に算出する。 The time slot selection unit 1p2 receives a signal in the QMF region from the frequency conversion unit 1a, and selects a time slot corresponding to a time interval on which the short time power calculation unit 1i performs the short time power calculation process. Based on the selection result notified from the time slot selecting unit 1p2, the short time power calculating unit 1i1 converts the short time power of the time section corresponding to the selected time slot to the short time power of the speech encoding device 11a of the first modification. Calculation is performed in the same manner as the power calculation unit 1i.

（第１の実施形態の変形例７）
第１の実施形態の変形例７の音声符号化装置１１ｅ（不図示）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声符号化装置１１ｅの内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって音声符号化装置１１ｅを統括的に制御する。音声符号化装置１１ｅの通信装置は、符号化の対象となる音声信号を外部から受信し、更に、符号化された多重化ビットストリームを外部に出力する。音声符号化装置１１ｅは、変形例６の音声符号化装置１１ｄの時間スロット選択部１ｐ２にかえて、図示しない時間スロット選択部１ｐ３を備える。さらに、ビットストリーム多重化部１ｇ１にかえて、時間スロット選択部１ｐ３からの出力をさらに受けるビットストリーム多重化部を備える。時間スロット選択部１ｐ３は、第1の実施形態の変形例６に記載の時間スロット選択部１ｐ２と同様に時間スロットを選択し、時間スロット選択情報をビットストリーム多重化部へ送る。 (Modification 7 of the first embodiment)
A speech encoding device 11e (not shown) of Modification 7 of the first embodiment includes a CPU, a ROM, a RAM, a communication device, and the like which are not physically illustrated, and this CPU is a speech encoding device such as a ROM. A predetermined computer program stored in the built-in memory of 11e is loaded into the RAM and executed to control the speech encoding device 11e in an integrated manner. The communication device of the audio encoding device 11e receives an audio signal to be encoded from the outside, and further outputs an encoded multiplexed bit stream to the outside. The speech encoding device 11e includes a time slot selecting unit 1p3 (not shown) instead of the time slot selecting unit 1p2 of the speech encoding device 11d of the modification 6. Further, in place of the bit stream multiplexing unit 1g1, a bit stream multiplexing unit that further receives an output from the time slot selection unit 1p3 is provided. The time slot selection unit 1p3 selects a time slot similarly to the time slot selection unit 1p2 described in the sixth modification of the first embodiment, and sends the time slot selection information to the bit stream multiplexing unit.

（第１の実施形態の変形例８）
第１の実施形態の変形例８の音声符号化装置（不図示）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の変形例８の音声符号化装置の内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって変形例８の音声符号化装置を統括的に制御する。変形例８の音声符号化装置の通信装置は、符号化の対象となる音声信号を外部から受信し、更に、符号化された多重化ビットストリームを外部に出力する。変形例８の音声符号化装置は、変形例２に記載の音声符号化装置に加え、時間スロット選択部１ｐをさらに備える。 (Modification 8 of the first embodiment)
A speech encoding apparatus (not shown) of Modification 8 of the first embodiment includes a CPU, a ROM, a RAM, a communication device, and the like that are not physically shown, and this CPU is a speech of Modification 8 of ROM or the like. A predetermined computer program stored in the internal memory of the encoding device is loaded into the RAM and executed, whereby the speech encoding device of the modification 8 is controlled in an integrated manner. The communication device of the audio encoding device according to the modified example 8 receives an audio signal to be encoded from the outside, and further outputs an encoded multiplexed bit stream to the outside. The speech encoding apparatus according to the modified example 8 further includes a time slot selecting unit 1p in addition to the speech encoding apparatus according to the modified example 2.

第１の実施形態の変形例８の音声復号装置（不図示）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の変形例８の音声復号装置の内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって変形例８の音声復号装置を統括的に制御する。変形例８の音声復号装置の通信装置は、符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。変形例８の音声復号装置は、変形例２に記載の音声復号装置の低周波線形予測分析部２ｄ、信号変化検出部２ｅ、高周波線形予測分析部２ｈ、及び線形予測逆フィルタ部２ｉ、及び線形予測フィルタ部２ｋにかえて、低周波線形予測分析部２ｄ１、信号変化検出部２ｅ１、高周波線形予測分析部２ｈ１、線形予測逆フィルタ部２ｉ１、及び線形予測フィルタ部２ｋ３を備え、時間スロット選択部３ａをさらに備える。 The speech decoding apparatus (not shown) of Modification 8 of the first embodiment includes a CPU, a ROM, a RAM, a communication device, and the like that are not physically shown, and this CPU performs speech decoding of Modification 8 of the ROM or the like. A predetermined computer program stored in the built-in memory of the apparatus is loaded into the RAM and executed, whereby the speech decoding apparatus of the modification 8 is comprehensively controlled. The communication device of the audio decoding device according to the modified example 8 receives the encoded multiplexed bit stream, and further outputs the decoded audio signal to the outside. The speech decoding apparatus according to Modification 8 includes a low-frequency linear prediction analysis unit 2d, a signal change detection unit 2e, a high-frequency linear prediction analysis unit 2h, a linear prediction inverse filter unit 2i, and a linear configuration of the speech decoding apparatus according to Modification 2. Instead of the prediction filter unit 2k, a low frequency linear prediction analysis unit 2d1, a signal change detection unit 2e1, a high frequency linear prediction analysis unit 2h1, a linear prediction inverse filter unit 2i1, and a linear prediction filter unit 2k3 are provided, and a time slot selection unit 3a Is further provided.

（第1の実施形態の変形例９）
第１の実施形態の変形例９の音声符号化装置（不図示）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の変形例９の音声符号化装置の内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって変形例９の音声符号化装置を統括的に制御する。変形例９の音声符号化装置の通信装置は、符号化の対象となる音声信号を外部から受信し、更に、符号化された多重化ビットストリームを外部に出力する。変形例９の音声符号化装置は、変形例８に記載の音声符号化装置の時間スロット選択部１ｐにかえて、時間スロット選択部１ｐ１を備える。さらに、変形例８に記載のビットストリーム多重化部にかえて、変形例８に記載のビットストリーム多重化部への入力に加えて時間スロット選択部１ｐ１からの出力をさらに受けるビットストリーム多重化部を備える。 (Modification 9 of the first embodiment)
The speech encoding apparatus (not shown) of Modification 9 of the first embodiment includes a CPU, a ROM, a RAM, a communication device, and the like which are not physically shown. This CPU is a speech of Modification 9 such as ROM. A predetermined computer program stored in the internal memory of the encoding device is loaded into the RAM and executed, whereby the speech encoding device of the modification 9 is controlled in an integrated manner. The communication device of the audio encoding device according to the modified example 9 receives an audio signal to be encoded from the outside, and further outputs an encoded multiplexed bit stream to the outside. The speech encoding apparatus of Modification 9 includes a time slot selection unit 1p1 instead of the time slot selection unit 1p of the speech encoding apparatus described in Modification 8. Further, in place of the bit stream multiplexing unit described in the modification 8, in addition to the input to the bit stream multiplexing unit described in the modification 8, the bit stream multiplexing unit that further receives the output from the time slot selection unit 1p1 Is provided.

第１の実施形態の変形例９の音声復号装置（不図示）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の変形例９の音声復号装置の内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって変形例９の音声復号装置を統括的に制御する。変形例９の音声復号装置の通信装置は、符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。変形例９の音声復号装置は、変形例８に記載の音声復号装置の時間スロット選択部３ａにかえて、時間スロット選択部３ａ１を備える。さらに、ビットストリーム分離部２ａにかえて、ビットストリーム分離部２ａ５のフィルタ強度パラメータにかえて前記変形例２に記載のａ_Ｄ（ｎ，ｒ）を分離するビットストリーム分離部を備える。 The speech decoding apparatus (not shown) of Modification 9 of the first embodiment includes a CPU, a ROM, a RAM, a communication device, and the like that are not physically shown, and this CPU is a speech decoding of Modification 9 such as ROM. A predetermined computer program stored in the built-in memory of the apparatus is loaded into the RAM and executed, whereby the speech decoding apparatus of the modified example 9 is comprehensively controlled. The communication device of the audio decoding device according to the modified example 9 receives the encoded multiplexed bit stream and further outputs the decoded audio signal to the outside. The speech decoding apparatus according to Modification 9 includes a time slot selection unit 3a1 instead of the time slot selection unit 3a of the speech decoding apparatus according to Modification 8. Further, in place of the bit stream separation unit 2a, a bit stream separation unit is provided that separates a _D (n, r) described in the second modification in place of the filter strength parameter of the bit stream separation unit 2a5.

（第２の実施形態の変形例１）
第２の実施形態の変形例１の音声符号化装置１２ａ(図４６)は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声符号化装置１２ａの内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって音声符号化装置１２ａを統括的に制御する。音声符号化装置１２ａの通信装置は、符号化の対象となる音声信号を外部から受信し、更に、符号化された多重化ビットストリームを外部に出力する。音声符号化装置１２ａは、音声符号化装置１２の線形予測分析部１ｅにかえて、線形予測分析部１ｅ１を備え、時間スロット選択部１ｐをさらに備える。 (Modification 1 of 2nd Embodiment)
The speech encoding device 12a (FIG. 46) according to the first modification of the second embodiment includes a CPU, a ROM, a RAM, a communication device, and the like which are not physically illustrated, and this CPU is a speech encoding device such as a ROM. A predetermined computer program stored in the built-in memory 12a is loaded into the RAM and executed, thereby controlling the speech encoding device 12a in an integrated manner. The communication device of the audio encoding device 12a receives an audio signal to be encoded from the outside, and further outputs an encoded multiplexed bit stream to the outside. The speech encoding device 12a includes a linear prediction analysis unit 1e1 instead of the linear prediction analysis unit 1e of the speech encoding device 12, and further includes a time slot selection unit 1p.

第２の実施形態の変形例１の音声復号装置２２ａ（図２２参照）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２２ａの内蔵メモリに格納された所定のコンピュータプログラム（例えば、図２３のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声復号装置２２ａを統括的に制御する。音声復号装置２２ａの通信装置は、符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。音声復号装置２２ａは、図２２に示すとおり、第２の実施形態の音声復号装置２２の高周波線形予測分析部２ｈ、線形予測逆フィルタ部２ｉ、線形予測フィルタ部２ｋ１、及び線形予測補間・補外部２ｐにかえて、高周波線形予測分析部２ｈ１、線形予測逆フィルタ部２ｉ１、線形予測フィルタ部２ｋ２、及び線形予測補間・補外部２ｐ１を備え、時間スロット選択部３ａをさらに備える。 The speech decoding device 22a (see FIG. 22) according to the first modification of the second embodiment includes a CPU, a ROM, a RAM, a communication device, and the like which are not physically illustrated. The CPU includes a speech decoding device 22a such as a ROM. A predetermined computer program (for example, a computer program for performing the processing shown in the flowchart of FIG. 23) stored in the built-in memory is loaded into the RAM and executed to control the speech decoding apparatus 22a in an integrated manner. The communication device of the audio decoding device 22a receives the encoded multiplexed bit stream, and further outputs the decoded audio signal to the outside. As shown in FIG. 22, the speech decoding device 22a includes a high-frequency linear prediction analysis unit 2h, a linear prediction inverse filter unit 2i, a linear prediction filter unit 2k1, and a linear prediction interpolation / external device of the speech decoding device 22 according to the second embodiment. Instead of 2p, a high-frequency linear prediction analysis unit 2h1, a linear prediction inverse filter unit 2i1, a linear prediction filter unit 2k2, and a linear prediction interpolation / complementary external 2p1 are provided, and a time slot selection unit 3a is further provided.

時間スロット選択部３ａは、時間スロットの選択結果を、高周波線形予測分析部２ｈ１、線形予測逆フィルタ部２ｉ１、線形予測フィルタ部２ｋ２、線形予測係数補間・補外部２ｐ１に通知する。線形予測係数補間・補外部２ｐ１では、時間スロット選択部３ａより通知された選択結果に基づき、選択された時間スロットであり線形予測係数の伝送されていない時間スロットｒ１に対応するａ_Ｈ（ｎ，ｒ）を、線形予測係数補間・補外部２ｐと同様に、補間又は補外により取得する（ステップＳｊ１の処理）。線形予測フィルタ部２ｋ２では、時間スロット選択部３ａより通知された選択結果に基づき、選択された時間スロットｒ１に関して、高周波調整部２ｊから出力されたｑ_ａｄｊ（ｎ，ｒ１）に対し、線形予測係数補間・補外部２ｐ１から得られた、補間又は補外されたａ_Ｈ（ｎ，ｒ１）を用いて、線形予測フィルタ部２ｋ１と同様に、周波数方向に線形予測合成フィルタ処理を行う（ステップＳｊ２の処理）。また、第１の実施形態の変形例３に記載の線形予測フィルタ部２ｋへの変更を、線形予測フィルタ部２ｋ２に加えてもよい。 The time slot selection unit 3a notifies the selection result of the time slot to the high frequency linear prediction analysis unit 2h1, the linear prediction inverse filter unit 2i1, the linear prediction filter unit 2k2, and the linear prediction coefficient interpolation / complementary external 2p1. In the linear prediction coefficient interpolation / external external 2p1, based on the selection result notified from the time slot selecting unit 3a, a _H (n, n, corresponding to the time slot r1 which is the selected time slot and the linear prediction coefficient is not transmitted. r) is obtained by interpolation or extrapolation in the same manner as the linear prediction coefficient interpolation / extrapolation 2p (processing of step Sj1). In the linear prediction filter unit 2k2, based on the selection result notified from the time slot selection unit 3a, for the selected time slot r1, the linear prediction coefficient is applied to q _adj (n, r1) output from the high frequency adjustment unit 2j. Using the interpolated or extrapolated a _H (n, r1) obtained from the interpolation / extrapolation 2p1, linear prediction synthesis filter processing is performed in the frequency direction in the same manner as the linear prediction filter unit 2k1 (in step Sj2). processing). Moreover, you may add the change to the linear prediction filter part 2k described in the modification 3 of 1st Embodiment to the linear prediction filter part 2k2.

（第２の実施形態の変形例２）
第２の実施形態の変形例２の音声符号化装置１２ｂ（図４７）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声符号化装置１２ｂの内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって音声符号化装置１１ｂを統括的に制御する。音声符号化装置１２ｂの通信装置は、符号化の対象となる音声信号を外部から受信し、更に、符号化された多重化ビットストリームを外部に出力する。音声符号化装置１２ｂは、変形例１の音声符号化装置１２ａの時間スロット選択部１ｐ、及びビットストリーム多重化部１ｇ２にかえて、時間スロット選択部１ｐ１、及びビットストリーム多重化部１ｇ５を備える。ビットストリーム多重化部１ｇ５は、ビットストリーム多重化部１ｇ２と同様に、コアコーデック符号化部１ｃで算出された符号化ビットストリームと、ＳＢＲ符号化部１ｄで算出されたＳＢＲ補助情報と、線形予測係数量子化部１ｋから与えられた量子化後の線形予測係数に対応する時間スロットのインデックスとを多重化し、さらに時間スロット選択部１ｐ１から受け取る時間スロット選択情報をビットストリームに多重化し、多重化ビットストリームを、音声符号化装置１２ｂの通信装置を介して出力する。 (Modification 2 of the second embodiment)
The speech encoding device 12b (FIG. 47) of the second modification of the second embodiment includes a CPU, a ROM, a RAM, a communication device, and the like which are not physically illustrated, and this CPU is a speech encoding device such as a ROM. A predetermined computer program stored in the built-in memory 12b is loaded into the RAM and executed to control the speech encoding device 11b in an integrated manner. The communication device of the audio encoding device 12b receives an audio signal to be encoded from the outside, and further outputs an encoded multiplexed bit stream to the outside. The speech encoding device 12b includes a time slot selecting unit 1p1 and a bit stream multiplexing unit 1g5 in place of the time slot selecting unit 1p and the bit stream multiplexing unit 1g2 of the speech encoding device 12a of Modification 1. Similarly to the bit stream multiplexing unit 1g2, the bit stream multiplexing unit 1g5, the encoded bit stream calculated by the core codec encoding unit 1c, the SBR auxiliary information calculated by the SBR encoding unit 1d, and linear prediction A time slot index corresponding to the quantized linear prediction coefficient given from the coefficient quantization unit 1k is multiplexed, and further, time slot selection information received from the time slot selection unit 1p1 is multiplexed into a bit stream, and multiplexed bits The stream is output via the communication device of the audio encoding device 12b.

第２の実施形態の変形例２の音声復号装置２２ｂ（図２４参照）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２２ｂの内蔵メモリに格納された所定のコンピュータプログラム（例えば、図２５のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声復号装置２２ｂを統括的に制御する。音声復号装置２２ｂの通信装置は、符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。音声復号装置２２ｂは、図２４に示すとおり、変形例１に記載の音声復号装置２２ａのビットストリーム分離部２ａ１、及び時間スロット選択部３ａ、にかえて、ビットストリーム分離部２ａ６、及び時間スロット選択部３ａ１を備え、時間スロット選択部３ａ１に時間スロット選択情報が入力される。ビットストリーム分離部２ａ６では、ビットストリーム分離部２ａ１と同様に、多重化ビットストリームを、量子化されたａ_Ｈ（ｎ，ｒ_ｉ）と、これに対応する時間スロットのインデックスｒ_ｉと、ＳＢＲ補助情報と、符号化ビットストリームとに分離し、時間スロット選択情報をさらに分離する。 The speech decoding device 22b (see FIG. 24) of Modification 2 of the second embodiment includes a CPU, a ROM, a RAM, a communication device, and the like that are not shown physically, and this CPU is a speech decoding device 22b such as a ROM. A predetermined computer program (for example, a computer program for performing the processing shown in the flowchart of FIG. 25) stored in the built-in memory is loaded into the RAM and executed to control the speech decoding apparatus 22b in an integrated manner. The communication device of the audio decoding device 22b receives the encoded multiplexed bit stream, and further outputs the decoded audio signal to the outside. As shown in FIG. 24, the audio decoding device 22b replaces the bit stream separation unit 2a1 and the time slot selection unit 3a of the audio decoding device 22a described in the first modification with the bit stream separation unit 2a6 and the time slot selection. Time slot selection information is input to the time slot selection unit 3a1. In the bit stream separation unit 2a6, as in the bit stream separation unit 2a1, the multiplexed bit stream is quantized using a _H (n, r _i ), the index r _i of the corresponding time slot, and the SBR auxiliary Information and encoded bitstream are separated, and time slot selection information is further separated.

（第３の実施形態の変形例４）第３の実施形態の変形例１に記載の

は、ｅ（ｒ）のＳＢＲエンベロープ内での平均値であってもよく、さらに別に定める値であってもよい。 (Modification 4 of the third embodiment) As described in Modification 1 of the third embodiment

May be an average value of e (r) within the SBR envelope, or may be a value determined separately.

（第３の実施形態の変形例５）
エンベロープ形状調整部２ｓは、前記第３の実施形態の変形例３に記載のとおり、調整後の時間エンベロープｅ_ａｄｊ（ｒ）が例えば数式（２８），数式（３７）及び（３８）のとおり、ＱＭＦサブバンドサンプルへ乗算されるゲイン係数であることを鑑み、ｅ_ａｄｊ（ｒ）を所定の値ｅ_{ａｄｊ，Ｔｈ}（ｒ）により以下のように制限することが望ましい。

(Modification 5 of the third embodiment)
As described in Modification 3 of the third embodiment, the envelope shape adjusting unit 2s has the adjusted time envelope e _adj (r) as expressed by, for example, Expression (28), Expression (37), and (38), In view of the fact that the gain coefficient is multiplied by the QMF subband sample, it is desirable to limit e _adj (r) by a predetermined value e _{adj, Th} (r) as follows.

（第４の実施形態）
第４の実施形態の音声符号化装置１４（図４８）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声符号化装置１４の内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって音声符号化装置１４を統括的に制御する。音声符号化装置１４の通信装置は、符号化の対象となる音声信号を外部から受信し、更に、符号化された多重化ビットストリームを外部に出力する。音声符号化装置１４は、第１の実施形態の変形例４の音声符号化装置１１ｂのビットストリーム多重化部１ｇにかえて、ビットストリーム多重化部１ｇ７を備え、さらに音声符号化装置１３の時間エンベロープ算出部１ｍ、及びエンベロープ形状パラメータ算出部１ｎを備える。 (Fourth embodiment)
The speech encoding device 14 (FIG. 48) of the fourth embodiment includes a CPU, a ROM, a RAM, a communication device, and the like which are not physically illustrated, and this CPU is a built-in memory of the speech encoding device 14 such as a ROM. The voice encoding device 14 is centrally controlled by loading a predetermined computer program stored in the RAM into the RAM and executing it. The communication device of the audio encoding device 14 receives an audio signal to be encoded from the outside, and further outputs an encoded multiplexed bit stream to the outside. The speech encoding device 14 includes a bit stream multiplexing unit 1g7 instead of the bit stream multiplexing unit 1g of the speech encoding device 11b according to the fourth modification of the first embodiment, and further includes the time of the speech encoding device 13. An envelope calculation unit 1m and an envelope shape parameter calculation unit 1n are provided.

ビットストリーム多重化部１ｇ７は、ビットストリーム多重化部１ｇと同様に、コアコーデック符号化部１ｃによって算出された符号化ビットストリームと、ＳＢＲ符号化部１ｄによって算出されたＳＢＲ補助情報とを多重化し、さらに、フィルタ強度パラメータ算出部によって算出されたフィルタ強度パラメータと、エンベロープ形状パラメータ算出部１ｎによって算出されたエンベロープ形状パラメータとを時間エンベロープ補助情報に変換して多重化し、多重化ビットストリーム（符号化された多重化ビットストリーム）を、音声符号化装置１４の通信装置を介して出力する。 Similarly to the bit stream multiplexing unit 1g, the bit stream multiplexing unit 1g7 multiplexes the encoded bit stream calculated by the core codec encoding unit 1c and the SBR auxiliary information calculated by the SBR encoding unit 1d. Further, the filter strength parameter calculated by the filter strength parameter calculation unit and the envelope shape parameter calculated by the envelope shape parameter calculation unit 1n are converted into time envelope auxiliary information and multiplexed, and a multiplexed bit stream (encoding) is performed. The multiplexed bit stream) is output via the communication device of the audio encoding device 14.

（第４の実施形態の変形例４）
第４の実施形態の変形例４の音声符号化装置１４ａ（図４９）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声符号化装置１４ａの内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって音声符号化装置１４ａを統括的に制御する。音声符号化装置１４ａの通信装置は、符号化の対象となる音声信号を外部から受信し、更に、符号化された多重化ビットストリームを外部に出力する。音声符号化装置１４ａは、第４の実施形態の音声符号化装置１４の線形予測分析部１ｅにかえて、線形予測分析部１ｅ１を備え、時間スロット選択部１ｐをさらに備える。 (Modification 4 of the fourth embodiment)
The speech encoding device 14a (FIG. 49) according to the fourth modification of the fourth embodiment includes a CPU, a ROM, a RAM, a communication device, and the like which are not physically illustrated, and this CPU is a speech encoding device such as a ROM. A predetermined computer program stored in the built-in memory 14a is loaded into the RAM and executed to control the speech encoding device 14a in an integrated manner. The communication device of the audio encoding device 14a receives an audio signal to be encoded from the outside, and further outputs an encoded multiplexed bit stream to the outside. The speech encoding device 14a includes a linear prediction analysis unit 1e1 instead of the linear prediction analysis unit 1e of the speech encoding device 14 of the fourth embodiment, and further includes a time slot selection unit 1p.

第４の実施形態の変形例４の音声復号装置２４ｄ（図２６参照）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２４ｄの内蔵メモリに格納された所定のコンピュータプログラム（例えば、図２７のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声復号装置２４ｄを統括的に制御する。音声復号装置２４ｄの通信装置は、符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。音声復号装置２４ｄは、図２６に示すとおり、音声復号装置２４の低周波線形予測分析部２ｄ、信号変化検出部２ｅ、高周波線形予測分析部２ｈ、及び線形予測逆フィルタ部２ｉ、及び線形予測フィルタ部２ｋにかえて、低周波線形予測分析部２ｄ１、信号変化検出部2e1、高周波線形予測分析部２ｈ１、線形予測逆フィルタ部２ｉ１、及び線形予測フィルタ部２ｋ３を備え、時間スロット選択部３ａをさらに備える。時間エンベロープ変形部２ｖは、線形予測フィルタ部２ｋ３から得られたＱＭＦ領域の信号を、エンベロープ形状調整部２ｓから得られた時間エンベロープ情報を用いて、第３の実施形態、第4の実施形態、及びそれらの変形例の時間エンベロープ変形部２ｖと同様に変形する（ステップＳｋ１の処理）。 A speech decoding device 24d (see FIG. 26) of Modification 4 of the fourth embodiment includes a CPU, a ROM, a RAM, a communication device, and the like that are not shown physically, and this CPU is a speech decoding device 24d such as a ROM. A predetermined computer program (for example, a computer program for performing the processing shown in the flowchart of FIG. 27) stored in the built-in memory is loaded into the RAM and executed to control the speech decoding device 24d in an integrated manner. The communication device of the audio decoding device 24d receives the encoded multiplexed bit stream, and further outputs the decoded audio signal to the outside. The speech decoding device 24d includes a low frequency linear prediction analysis unit 2d, a signal change detection unit 2e, a high frequency linear prediction analysis unit 2h, a linear prediction inverse filter unit 2i, and a linear prediction filter as shown in FIG. In place of the unit 2k, a low frequency linear prediction analysis unit 2d1, a signal change detection unit 2e1, a high frequency linear prediction analysis unit 2h1, a linear prediction inverse filter unit 2i1, and a linear prediction filter unit 2k3 are provided, and the time slot selection unit 3a is further provided. Prepare. The temporal envelope deforming unit 2v uses the signal of the QMF region obtained from the linear prediction filter unit 2k3, the temporal envelope information obtained from the envelope shape adjusting unit 2s, as the third embodiment, the fourth embodiment, And it deform | transforms similarly to the time envelope deformation | transformation part 2v of those modifications (process of step Sk1).

（第４の実施形態の変形例５）
第４の実施形態の変形例５の音声復号装置２４ｅ（図２８参照）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２４ｅの内蔵メモリに格納された所定のコンピュータプログラム（例えば、図２９のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声復号装置２４ｅを統括的に制御する。音声復号装置２４ｅの通信装置は、符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。音声復号装置２４ｅは、図２８に示すとおり、変形例５においては、第１の実施形態と同様に第４の実施形態の全体を通して省略可能である、変形例４に記載の音声復号装置２４ｄの高周波線形予測分析部２ｈ１と、線形予測逆フィルタ部２ｉ１を省略し、音声復号装置２４ｄの時間スロット選択部３ａ、及び時間エンベロープ変形部２ｖにかえて、時間スロット選択部３ａ２、及び時間エンベロープ変形部２ｖ１を備える。さらに、第４の実施形態の全体を通して処理順序を入れ替え可能である線形予測フィルタ部２ｋ３の線形予測合成フィルタ処理と時間エンベロープ変形部２ｖ１での時間エンベロープの変形処理の順序を入れ替える。 (Modification 5 of the fourth embodiment)
The speech decoding device 24e (see FIG. 28) of Modification 5 of the fourth embodiment includes a CPU, a ROM, a RAM, a communication device, and the like which are not physically illustrated, and this CPU is a speech decoding device 24e such as a ROM. A predetermined computer program (for example, a computer program for performing the processing shown in the flowchart of FIG. 29) stored in the built-in memory is loaded into the RAM and executed to control the speech decoding device 24e in an integrated manner. The communication device of the audio decoding device 24e receives the encoded multiplexed bit stream, and further outputs the decoded audio signal to the outside. As shown in FIG. 28, the speech decoding device 24 e of the speech decoding device 24 d according to Modification 4 can be omitted throughout the fourth embodiment in Modification 5 as in the first embodiment. The high frequency linear prediction analysis unit 2h1 and the linear prediction inverse filter unit 2i1 are omitted, and instead of the time slot selection unit 3a and the time envelope transformation unit 2v of the speech decoding device 24d, a time slot selection unit 3a2 and a time envelope transformation unit 2v1. Furthermore, the order of the linear prediction synthesis filter processing of the linear prediction filter unit 2k3 and the time envelope deformation processing in the time envelope deformation unit 2v1 that can change the processing order throughout the fourth embodiment are interchanged.

時間エンベロープ変形部２ｖ１は、時間エンベロープ変形部２ｖと同様に、高周波調整部２ｊから得られたｑ_ａｄｊ（ｋ，ｒ）をエンベロープ形状調整部２ｓから得られたｅ_ａｄｊ（ｒ）を用いて変形し、時間エンベロープが変形されたＱＭＦ領域の信号ｑ_{ｅｎｖａｄｊ}（ｋ，ｒ）を取得する。さらに、時間エンベロープ変形処理時に得られたパラメータ、または少なくとも時間エンベロープ変形処理時に得られたパラメータを用いて算出したパラメータを時間スロット選択情報として、時間スロット選択部３ａ２に通知する。時間スロット選択情報としては、数式（２２）、数式（４０）のｅ（ｒ）またはその算出過程にて平方根演算をしない｜ｅ（ｒ）｜^２でもよく、さらにある複数時間スロット区間（例えばＳＢＲエンベロープ）

でのそれらの平均値である数式（２４）の

もあわせて時間スロット選択情報としてもよい。ただし、

である。 Similarly to the time envelope deformation unit 2v, the time envelope deformation unit 2v1 deforms q _adj (k, r) obtained from the high frequency adjustment unit 2j by using e _adj (r) obtained from the envelope shape adjustment unit 2s. Then, a signal _qenvadj (k, r) in the QMF region in which the time envelope is deformed is acquired. Further, the time slot selection unit 3a2 is notified of the parameters obtained during the time envelope transformation process or at least the parameters calculated using the parameters obtained during the time envelope transformation process as time slot selection information. The time slot selection information may be e (r) in Equation (22) or Equation (40) or | e (r) | ^{2 in} which the square root operation is not performed in the calculation process, and a plurality of time slot intervals (for example, SBR). envelope)

The average value of them in (24)

In addition, time slot selection information may be used. However,

It is.

さらに時間スロット選択情報としては、数式（２６）、数式（４１）のｅ_ｅｘｐ（ｒ）またはその算出過程にて平方根演算をしない｜ｅ_ｅｘｐ（ｒ）｜^２でもよく、さらにある複数時間スロット区間（例えばＳＢＲエンベロープ）

でのそれらの平均値である

もあわせて時間スロット選択情報としてもよい。ただし、

である。さらに時間スロット選択情報としては、数式（２３）、数式（３５）、数式（３６）のｅ_ａｄｊ（ｒ）またはその算出過程にて平方根演算をしない｜ｅ_ａｄｊ（ｒ）｜^２でもよく、さらにある複数時間スロット区間（例えばＳＢＲエンベロープ）

でのそれらの平均値である

もあわせて時間スロット選択情報としてもよい。ただし、

である。さらに時間スロット選択情報としては、数式（３７）のｅ_{ａｄｊ，ｓｃａｌｅｄ}（ｒ）またはその算出過程にて平方根演算をしない｜ｅ_{ａｄｊ，ｓｃａｌｅｄ}（ｒ）｜^２でもよく、さらにある複数時間スロット区間（例えばＳＢＲエンベロープ）

でのそれらの平均値である

もあわせて時間スロット選択情報としてもよい。ただし、

である。さらに時間スロット選択情報としては、時間エンベロープが変形された高周波成分に対応するＱＭＦ領域信号の時間スロットｒの信号電力Ｐ_{ｅｎｖａｄｊ}（ｒ）またはそれの平方根演算をした信号振幅値

でもよく、さらにある複数時間スロット区間（例えばＳＢＲエンベロープ）

でのそれらの平均値である

もあわせて時間スロット選択情報としてもよい。ただし、

である。ただし、Ｍは高周波生成部２ｇによって生成される高周波成分の下限周波数ｋ_ｘより高い周波数の範囲を表す値であり、さらには高周波生成部２ｇによって生成される高周波成分の周波数範囲をｋ_ｘ≦ｋ＜ｋ_ｘ＋Mのように表してもよい。 Further, the time slot selection information may be e _exp (r) in Equation (26) or Equation (41) or | e _exp (r) | ^{2 in} which the square root operation is not performed in the calculation process, and a plurality of time slot intervals. (Eg SBR envelope)

Is their average value at

In addition, time slot selection information may be used. However,

It is. The more time slot selection information, Equation (23), equation (35), not the square root operation in _{e adj} (r) or a calculation process of Equation _{(36) | e adj (r} ) | 2 even better, further Some time slot intervals (eg SBR envelope)

Is their average value at

In addition, time slot selection information may be used. However,

It is. Further, the time slot selection information may be e _{adj, scaled} (r) in Expression (37) or | e _{adj, scaled} (r) | ^{2 in} which a square root operation is not performed in the calculation process, and a plurality of time slot intervals ( Eg SBR envelope)

Is their average value at

In addition, time slot selection information may be used. However,

It is. Further, as the time slot selection information, the signal power _{value Penvadj} (r) of the time slot r of the QMF domain signal corresponding to the high frequency component whose time envelope is deformed or the signal amplitude value obtained by calculating the square root thereof.

There may be more than one multiple time slot interval (eg SBR envelope)

Is their average value at

In addition, time slot selection information may be used. However,

It is. However, M is a value representing a frequency range higher than the lower limit frequency k _{x of the} high frequency component generated by the high frequency generation unit 2g, and further, the frequency range of the high frequency component generated by the high frequency generation unit 2g is represented by k _x ≦ k. It may be expressed as <k _x + M.

時間スロット選択部３a２は、時間エンベロープ変形部２v１から通知された時間スロット選択情報に基づいて、時間エンベロープ変形部２v１にて時間エンベロープを変形された時間スロットｒの高周波成分のＱＭＦ領域の信号ｑ_{ｅｎｖａｄｊ}（ｋ，ｒ）に対して、線形予測フィルタ部２ｋにおいて線形予測合成フィルタ処理を施すか否かを判断し、線形予測合成フィルタ処理を施す時間スロットを選択する(ステップＳｐ１の処理)。 Based on the time slot selection information notified from the time envelope deforming unit 2v1, the time slot selecting unit 3a2 receives the signal _qenvadj of the high frequency component of the time slot r whose time envelope has been deformed by the time envelope deforming unit 2v1. For (k, r), it is determined whether or not the linear prediction synthesis filter processing is performed in the linear prediction filter unit 2k, and a time slot on which the linear prediction synthesis filter processing is performed is selected (processing in step Sp1).

本変形例における時間スロット選択部３ａ２での線形予測合成フィルタ処理を施す時間スロットの選択では、時間エンベロープ変形部２ｖ１から通知された時間スロット選択情報に含まれるパラメータｕ（ｒ）が所定の値ｕ_Ｔｈよりも大きい時間スロットｒをひとつ以上選択してもよく、ｕ（ｒ）が所定の値u_Thよりも大きいか等しい時間スロットｒをひとつ以上選択してもよい。ｕ（ｒ）は、上記ｅ（ｒ）、｜ｅ（ｒ）｜^２、ｅ_ｅｘｐ（ｒ）、｜ｅ_ｅｘｐ（ｒ）｜^２、ｅ_ａｄｊ（ｒ）、｜ｅ_ａｄｊ（ｒ）｜^２、ｅ_{ａｄｊ，ｓｃａｌｅｄ}（ｒ）、｜ｅ_{ａｄｊ，ｓｃａｌｅｄ}（ｒ）｜^２、Ｐ_{ｅｎｖａｄｊ}（ｒ）、そして、

のうち少なくともひとつを含んでいてもよく、ｕ_Ｔｈは、上記

のうち少なくともひとつを含んでもよい。またｕ_Ｔｈは、時間スロットｒを含む所定の時間幅（例えばＳＢＲエンベロープ）のｕ（ｒ）の平均値でもよい。さらに、ｕ（ｒ）がピークになる時間スロットが含まれるように選択してもよい。ｕ（ｒ）のピークは、前記第１の実施形態の変形例４における高周波成分のＱＭＦ領域信号の信号電力のピークの算出と同様に算出できる。さらに、前記第１の実施形態の変形例４における定常状態と過渡状態を、ｕ（ｒ）を用いて前記第１の実施形態の変形例４と同様に判断し、それに基づいて時間スロットを選択してもよい。時間スロットの選択方法は、前記の方法を少なくともひとつ用いてもよく、さらには前記とは異なる方法を少なくともひとつ用いてもよく、さらにはそれらを組み合わせてもよい。 In the time slot selection in the time slot selection unit 3a2 in the present modification, the parameter u (r) included in the time slot selection information notified from the time envelope modification unit 2v1 is a predetermined value u. One or more time slots r greater than _Th may be selected, and one or more time slots r for which u (r) is greater than or equal to a predetermined value u _Th may be selected. u (r) is the above e (r), | e (r) | ² , e _exp (r), | e _exp (r) | ² , e _adj (r), | e _adj (r) | ² , e _{adj, scaled} (r), | e _{adj, scaled} (r) | ² , P _envelope (r), and

At least one of the above, u _Th may be

May include at least one of them. U _Th may be an average value of u (r) in a predetermined time width (for example, SBR envelope) including the time slot r. Further, it may be selected to include a time slot in which u (r) peaks. The peak of u (r) can be calculated in the same manner as the calculation of the peak of the signal power of the QMF region signal of the high frequency component in the fourth modification of the first embodiment. Further, the steady state and the transient state in the modification 4 of the first embodiment are determined in the same manner as in the modification 4 of the first embodiment using u (r), and the time slot is selected based on the determination. May be. As the time slot selection method, at least one of the above methods may be used, and at least one method different from the above method may be used, or a combination thereof may be used.

（第４の実施形態の変形例６）
第４の実施形態の変形例６の音声復号装置２４ｆ（図３０参照）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２４ｆの内蔵メモリに格納された所定のコンピュータプログラム（例えば、図２９のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声復号装置２４ｆを統括的に制御する。音声復号装置２４ｆの通信装置は、符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。音声復号装置２４ｆは、図３０に示すとおり、変形例６においては、第１の実施形態と同様に第4の実施形態の全体を通して省略可能である、変形例4に記載の音声復号装置２４ｄの信号変化検出部２ｅ１と、高周波線形予測分析部２ｈ１と、線形予測逆フィルタ部２ｉ１を省略し、音声復号装置２４ｄの時間スロット選択部３ａ、及び時間エンベロープ変形部２ｖにかえて、時間スロット選択部３ａ２、及び時間エンベロープ変形部２ｖ１を備える。さらに、第４の実施形態の全体を通して処理順序を入れ替え可能である線形予測フィルタ部２ｋ３の線形予測合成フィルタ処理と時間エンベロープ変形部２ｖ１での時間エンベロープの変形処理の順序を入れ替える。 (Modification 6 of 4th Embodiment)
A speech decoding device 24f (see FIG. 30) of Modification 6 of the fourth embodiment includes a CPU, a ROM, a RAM, a communication device, and the like which are not physically illustrated, and this CPU is a speech decoding device 24f such as a ROM. A predetermined computer program (for example, a computer program for performing the processing shown in the flowchart of FIG. 29) stored in the built-in memory is loaded into the RAM and executed to control the speech decoding device 24f in an integrated manner. The communication device of the audio decoding device 24f receives the encoded multiplexed bit stream, and further outputs the decoded audio signal to the outside. As shown in FIG. 30, the speech decoding device 24 f of the speech decoding device 24 d according to the modification 4 can be omitted throughout the fourth embodiment in the modification 6 as in the first embodiment. The signal change detection unit 2e1, the high-frequency linear prediction analysis unit 2h1, and the linear prediction inverse filter unit 2i1 are omitted, and the time slot selection unit is replaced with the time slot selection unit 3a and the time envelope modification unit 2v of the speech decoding device 24d. 3a2 and a time envelope deformation unit 2v1. Furthermore, the order of the linear prediction synthesis filter processing of the linear prediction filter unit 2k3 and the time envelope deformation processing in the time envelope deformation unit 2v1 that can change the processing order throughout the fourth embodiment are interchanged.

時間スロット選択部３ａ２は、時間エンベロープ変形部２ｖ１から通知された時間スロット選択情報に基づいて、時間エンベロープ変形部２ｖ１にて時間エンベロープを変形された時間スロットｒの高周波成分のＱＭＦ領域の信号ｑ_{ｅｎｖａｄｊ}（ｋ，ｒ）に対して、線形予測フィルタ部２ｋ３において線形予測合成フィルタ処理を施すか否かを判断し、線形予測合成フィルタ処理を施す時間スロットを選択し、選択した時間スロットを低周波線形予測分析部２ｄ１と線形予測フィルタ部２ｋ３に通知する。 Based on the time slot selection information notified from the time envelope deforming unit 2v1, the time slot selecting unit 3a2 receives the signal _qenvadj of the high frequency component of the time slot r whose time envelope has been deformed by the time envelope deforming unit 2v1. For (k, r), it is determined whether or not the linear prediction synthesis filter processing is performed in the linear prediction filter unit 2k3, a time slot on which the linear prediction synthesis filter processing is performed is selected, and the selected time slot is a low frequency linearity. Notify the prediction analysis unit 2d1 and the linear prediction filter unit 2k3.

（第４の実施形態の変形例７）
第４の実施形態の変形例７の音声符号化装置１４ｂ(図５０)は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声符号化装置１４ｂの内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって音声符号化装置１４ｂを統括的に制御する。音声符号化装置１４ｂの通信装置は、符号化の対象となる音声信号を外部から受信し、更に、符号化された多重化ビットストリームを外部に出力する。音声符号化装置１４ｂは、変形例４の音声符号化装置１４ａのビットストリーム多重化部１ｇ７、及び時間スロット選択部１ｐにかえて、ビットストリーム多重化部１ｇ６、および時間スロット選択部１ｐ１を備える。 (Modification 7 of the fourth embodiment)
The speech encoding device 14b (FIG. 50) of Modification 7 of the fourth embodiment includes a CPU, a ROM, a RAM, a communication device, and the like which are not physically illustrated. The CPU is a speech encoding device such as a ROM. A predetermined computer program stored in the built-in memory 14b is loaded into the RAM and executed to control the speech encoding device 14b in an integrated manner. The communication device of the audio encoding device 14b receives an audio signal to be encoded from the outside, and further outputs an encoded multiplexed bit stream to the outside. The speech encoding device 14b includes a bit stream multiplexing unit 1g6 and a time slot selecting unit 1p1 instead of the bit stream multiplexing unit 1g7 and the time slot selecting unit 1p of the speech encoding device 14a of the fourth modification.

ビットストリーム多重化部１ｇ６は、ビットストリーム多重化部１ｇ７と同様に、コアコーデック符号化部１ｃによって算出された符号化ビットストリームと、ＳＢＲ符号化部１ｄによって算出されたＳＢＲ補助情報と、フィルタ強度パラメータ算出部によって算出されたフィルタ強度パラメータとエンベロープ形状パラメータ算出部１ｎによって算出されたエンベロープ形状パラメータとを変換した時間エンベロープ補助情報とを多重化し、さらに時間スロット選択部１ｐ１より受け取った時間スロット選択情報を多重化し、多重化ビットストリーム（符号化された多重化ビットストリーム）を、音声符号化装置１４ｂの通信装置を介して出力する。 Similarly to the bit stream multiplexing unit 1g7, the bit stream multiplexing unit 1g6, the encoded bit stream calculated by the core codec encoding unit 1c, the SBR auxiliary information calculated by the SBR encoding unit 1d, and the filter strength The time slot selection information received from the time slot selection unit 1p1 is multiplexed by multiplexing the filter strength parameter calculated by the parameter calculation unit and the time envelope auxiliary information obtained by converting the envelope shape parameter calculated by the envelope shape parameter calculation unit 1n. Are multiplexed, and a multiplexed bit stream (encoded multiplexed bit stream) is output via the communication device of the audio encoding device 14b.

第４の実施形態の変形例７の音声復号装置２４g（図３１参照）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２４ｇの内蔵メモリに格納された所定のコンピュータプログラム（例えば、図３２のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声復号装置２４ｇを統括的に制御する。音声復号装置２４ｇの通信装置は、符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。音声復号装置２４ｇは、図３１に示すとおり、変形例４に記載の音声復号装置２４ｄのビットストリーム分離部２ａ３、及び時間スロット選択部３ａにかえて、ビットストリーム分離部２ａ７、及び時間スロット選択部３ａ１を備える。 A speech decoding device 24g (see FIG. 31) of Modification 7 of the fourth embodiment includes a CPU, a ROM, a RAM, a communication device, and the like which are not shown physically, and this CPU is a speech decoding device 24g such as a ROM. A predetermined computer program (for example, a computer program for performing the processing shown in the flowchart of FIG. 32) stored in the built-in memory is loaded into the RAM and executed to control the speech decoding device 24g in an integrated manner. The communication device of the audio decoding device 24g receives the encoded multiplexed bit stream, and further outputs the decoded audio signal to the outside. As shown in FIG. 31, the audio decoding device 24g replaces the bit stream separation unit 2a3 and the time slot selection unit 3a of the audio decoding device 24d described in Modification 4 with a bit stream separation unit 2a7 and a time slot selection unit. 3a1 is provided.

ビットストリーム分離部２ａ７は、音声復号装置２４ｇの通信装置を介して入力された多重化ビットストリームを、ビットストリーム分離部２a３と同様に、時間エンベロープ補助情報と、ＳＢＲ補助情報と、符号化ビットストリームと、に分離し、さらに時間スロット選択情報とに分離する。 The bit stream separation unit 2a7, like the bit stream separation unit 2a3, converts the time envelope auxiliary information, the SBR auxiliary information, and the encoded bit stream from the multiplexed bit stream input via the communication device of the audio decoding device 24g. And time slot selection information.

（第４の実施形態の変形例８）
第４の実施形態の変形例８の音声復号装置２４ｈ（図３３参照）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２４ｈの内蔵メモリに格納された所定のコンピュータプログラム（例えば、図３４のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声復号装置２４ｈを統括的に制御する。音声復号装置２４ｈの通信装置は、符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。音声復号装置２４ｈは、図３３に示すとおり、変形例２の音声復号装置２４ｂの低周波線形予測分析部２ｄ、信号変化検出部２ｅ、高周波線形予測分析部２ｈ、線形予測逆フィルタ部２ｉ、及び線形予測フィルタ部２ｋにかえて、低周波線形予測分析部２ｄ１、信号変化検出部２ｅ１、高周波線形予測分析部２ｈ１、線形予測逆フィルタ部２ｉ１、及び線形予測フィルタ部２ｋ３を備え、時間スロット選択部３ａをさらに備える。一次高周波調整部２ｊ１は、第4の実施形態の変形例２における一次高周波調整部２ｊ１と同様に、前記“MPEG-4 AAC”のSBRにおける”HF Adjustment“ステップにある処理のいずれか一つ以上を行う（ステップＳｍ１の処理）。二次高周波調整部２ｊ２は、第4の実施形態の変形例２における二次高周波調整部２ｊ２と同様に、前記“MPEG-4 AAC”のSBRにおける”HF Adjustment“ステップにある処理のいずれか一つ以上を行う（ステップＳｍ２の処理）。二次高周波調整部２ｊ２で行う処理は、前記“MPEG-4 AAC”のSBRにおける”HF Adjustment“ステップにある処理のうち、一次高周波調整部２ｊ１で行われなかった処理とすることが望ましい。 (Modification 8 of the fourth embodiment)
The speech decoding device 24h (see FIG. 33) of Modification 8 of the fourth embodiment is physically provided with a CPU, ROM, RAM, communication device, etc. (not shown), and this CPU is a speech decoding device 24h such as a ROM. A predetermined computer program (for example, a computer program for performing the processing shown in the flowchart of FIG. 34) stored in the built-in memory is loaded into the RAM and executed to control the speech decoding device 24h in an integrated manner. The communication device of the audio decoding device 24h receives the encoded multiplexed bit stream, and further outputs the decoded audio signal to the outside. As shown in FIG. 33, the speech decoding device 24h includes a low frequency linear prediction analysis unit 2d, a signal change detection unit 2e, a high frequency linear prediction analysis unit 2h, a linear prediction inverse filter unit 2i, and In place of the linear prediction filter unit 2k, a low frequency linear prediction analysis unit 2d1, a signal change detection unit 2e1, a high frequency linear prediction analysis unit 2h1, a linear prediction inverse filter unit 2i1, and a linear prediction filter unit 2k3 are provided, and a time slot selection unit 3a is further provided. The primary high-frequency adjusting unit 2j1 is one or more of the processes in the “HF Adjustment” step in the SBR of the “MPEG-4 AAC”, similarly to the primary high-frequency adjusting unit 2j1 in the second modification of the fourth embodiment. (Step Sm1 processing). Similarly to the secondary high frequency adjustment unit 2j2 in the second modification of the fourth embodiment, the secondary high frequency adjustment unit 2j2 is one of the processes in the “HF Adjustment” step in the SBR of the “MPEG-4 AAC”. One or more processes are performed (the process of step Sm2). The processing performed by the secondary high-frequency adjusting unit 2j2 is preferably a process that is not performed by the primary high-frequency adjusting unit 2j1 among the processes in the “HF Adjustment” step in the SBR of “MPEG-4 AAC”.

（第４の実施形態の変形例９）
第４の実施形態の変形例９の音声復号装置２４ｉ（図３５参照）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２４ｉの内蔵メモリに格納された所定のコンピュータプログラム（例えば、図３６のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声復号装置２４ｉを統括的に制御する。音声復号装置２４ｉの通信装置は、符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。音声復号装置２４ｉは、図３５に示すとおり、第１の実施形態と同様に第4の実施形態の全体を通して省略可能である、変形例８の音声復号装置２４ｈの高周波線形予測分析部２ｈ１、及び線形予測逆フィルタ部２ｉ１を省略し、変形例８の音声復号装置２４ｈの時間エンベロープ変形部２ｖ、及び時間スロット選択部３ａにかえて、時間エンベロープ変形部２ｖ１、及び時間スロット選択部３ａ２を備える。さらに、第４の実施形態の全体を通して処理順序を入れ替え可能である線形予測フィルタ部２ｋ３の線形予測合成フィルタ処理と時間エンベロープ変形部２ｖ１での時間エンベロープの変形処理の順序を入れ替える。 (Modification 9 of the fourth embodiment)
A speech decoding device 24i (see FIG. 35) of Modification 9 of the fourth embodiment includes a CPU, a ROM, a RAM, a communication device, and the like that are not shown physically, and this CPU is a speech decoding device 24i such as a ROM. A predetermined computer program (for example, a computer program for performing the processing shown in the flowchart of FIG. 36) stored in the built-in memory is loaded into the RAM and executed to control the speech decoding device 24i in an integrated manner. The communication device of the audio decoding device 24i receives the encoded multiplexed bit stream, and further outputs the decoded audio signal to the outside. As shown in FIG. 35, the speech decoding device 24i can be omitted throughout the fourth embodiment as in the first embodiment, and the high-frequency linear prediction analysis unit 2h1 of the speech decoding device 24h according to the modified example 8, The linear predictive inverse filter unit 2i1 is omitted, and a time envelope deforming unit 2v1 and a time slot selecting unit 3a2 are provided instead of the time envelope deforming unit 2v and the time slot selecting unit 3a of the speech decoding device 24h according to the modified example 8. Furthermore, the order of the linear prediction synthesis filter processing of the linear prediction filter unit 2k3 and the time envelope deformation processing in the time envelope deformation unit 2v1 that can change the processing order throughout the fourth embodiment are interchanged.

（第４の実施形態の変形例１０）
第４の実施形態の変形例１０の音声復号装置２４ｊ（図３７参照）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２４ｊの内蔵メモリに格納された所定のコンピュータプログラム（例えば、図３６のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声復号装置２４ｊを統括的に制御する。音声復号装置２４ｊの通信装置は、符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。音声復号装置２４ｊは、図３７に示すとおり、第１の実施形態と同様に第4の実施形態の全体を通して省略可能である、変形例８の音声復号装置２４ｈの信号変化検出部２ｅ１、高周波線形予測分析部２ｈ１、及び線形予測逆フィルタ部２ｉ１を省略し、変形例８の音声復号装置２４ｈの時間エンベロープ変形部２ｖ、及び時間スロット選択部３ａにかえて、時間エンベロープ変形部２ｖ１、及び時間スロット選択部３ａ２を備える。さらに、第４の実施形態の全体を通して処理順序を入れ替え可能である線形予測フィルタ部２ｋ３の線形予測合成フィルタ処理と時間エンベロープ変形部２ｖ１での時間エンベロープの変形処理の順序を入れ替える。 (Modification 10 of the fourth embodiment)
The speech decoding device 24j (see FIG. 37) of Modification 10 of the fourth embodiment includes a CPU, a ROM, a RAM, a communication device, and the like that are not physically illustrated, and this CPU is a speech decoding device 24j such as a ROM. A predetermined computer program (for example, a computer program for performing the processing shown in the flowchart of FIG. 36) stored in the built-in memory is loaded into the RAM and executed to control the speech decoding device 24j in an integrated manner. The communication device of the audio decoding device 24j receives the encoded multiplexed bit stream, and further outputs the decoded audio signal to the outside. As shown in FIG. 37, the speech decoding device 24j can be omitted throughout the fourth embodiment as in the first embodiment. The signal change detection unit 2e1 of the speech decoding device 24h according to the modified example 8, the high-frequency linearity can be omitted. The prediction analysis unit 2h1 and the linear prediction inverse filter unit 2i1 are omitted, and the time envelope modification unit 2v1 and the time slot are replaced with the time envelope modification unit 2v and the time slot selection unit 3a of the speech decoding device 24h according to the modification 8. A selection unit 3a2 is provided. Furthermore, the order of the linear prediction synthesis filter processing of the linear prediction filter unit 2k3 and the time envelope deformation processing in the time envelope deformation unit 2v1 that can change the processing order throughout the fourth embodiment are interchanged.

（第４の実施形態の変形例１１）
第４の実施形態の変形例１１の音声復号装置２４ｋ（図３８参照）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２４ｋの内蔵メモリに格納された所定のコンピュータプログラム（例えば、図３９のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声復号装置２４ｋを統括的に制御する。音声復号装置２４ｋの通信装置は、符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。音声復号装置２４ｋは、図３８に示すとおり、変形例８の音声復号装置２４ｈのビットストリーム分離部２ａ３、及び時間スロット選択部３ａにかえて、ビットストリーム分離部２ａ７、及び時間スロット選択部３ａ１を備える。 (Modification 11 of the fourth embodiment)
A speech decoding device 24k (see FIG. 38) of Modification 11 of the fourth embodiment includes a CPU, a ROM, a RAM, a communication device, and the like which are not physically illustrated, and this CPU is a speech decoding device 24k such as a ROM. A predetermined computer program (for example, a computer program for performing the processing shown in the flowchart of FIG. 39) stored in the built-in memory is loaded into the RAM and executed to control the speech decoding device 24k in an integrated manner. The communication device of the audio decoding device 24k receives the encoded multiplexed bit stream, and further outputs the decoded audio signal to the outside. As shown in FIG. 38, the audio decoding device 24k replaces the bit stream separation unit 2a3 and the time slot selection unit 3a of the audio decoding device 24h of Modification 8 with a bit stream separation unit 2a7 and a time slot selection unit 3a1. Prepare.

（第４の実施形態の変形例１２）
第４の実施形態の変形例１２の音声復号装置２４ｑ（図４０参照）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２４ｑの内蔵メモリに格納された所定のコンピュータプログラム（例えば、図４１のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声復号装置２４ｑを統括的に制御する。音声復号装置２４ｑの通信装置は、符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。音声復号装置２４ｑは、図４０に示すとおり、変形例３の音声復号装置２４ｃの低周波線形予測分析部２ｄ、信号変化検出部２ｅ、高周波線形予測分析部２ｈ、線形予測逆フィルタ部２ｉ、及び個別信号成分調整部２ｚ１，２ｚ２，２ｚ３にかえて、低周波線形予測分析部２ｄ１、信号変化検出部２ｅ１、高周波線形予測分析部２ｈ１、線形予測逆フィルタ部２ｉ１、及び個別信号成分調整部２ｚ４，２ｚ５，２ｚ６を備え（個別信号成分調整部は、時間エンベロープ変形手段に相当する）、時間スロット選択部3aをさらに備える。 (Modification 12 of the fourth embodiment)
The speech decoding device 24q (see FIG. 40) of Modification 12 of the fourth embodiment includes a CPU, a ROM, a RAM, a communication device, and the like which are not physically illustrated, and this CPU is a speech decoding device 24q such as a ROM. A predetermined computer program (for example, a computer program for performing the processing shown in the flowchart of FIG. 41) stored in the built-in memory is loaded into the RAM and executed to control the speech decoding device 24q in an integrated manner. The communication device of the audio decoding device 24q receives the encoded multiplexed bit stream, and further outputs the decoded audio signal to the outside. As shown in FIG. 40, the speech decoding device 24q includes a low frequency linear prediction analysis unit 2d, a signal change detection unit 2e, a high frequency linear prediction analysis unit 2h, a linear prediction inverse filter unit 2i, and Instead of the individual signal component adjustment units 2z1, 2z2, and 2z3, the low frequency linear prediction analysis unit 2d1, the signal change detection unit 2e1, the high frequency linear prediction analysis unit 2h1, the linear prediction inverse filter unit 2i1, and the individual signal component adjustment unit 2z4. 2z5 and 2z6 (the individual signal component adjustment unit corresponds to time envelope transformation means), and further includes a time slot selection unit 3a.

個別信号成分調整部２ｚ４，２ｚ５，２ｚ６のうち少なくともひとつは、前記一次高周波調整部の出力に含まれる信号成分に関して、時間スロット選択部３ａより通知された選択結果に基づき、選択された時間スロットのＱＭＦ領域信号に対して、個別信号成分調整部２ｚ１，２ｚ２，２ｚ３と同様に、処理を行う（ステップＳｎ１の処理）。時間スロット選択情報を用いて行う処理は、前記第４の実施形態の変形例３に記載の個別信号成分調整部２ｚ１，２ｚ２，２ｚ３における処理のうち、周波数方向の線形予測合成フィルタ処理を含む処理のうち少なくともひとつを含むのが望ましい。 At least one of the individual signal component adjustment units 2z4, 2z5, and 2z6 relates to the signal component included in the output of the primary high frequency adjustment unit based on the selection result notified from the time slot selection unit 3a. The QMF region signal is processed in the same manner as the individual signal component adjustment units 2z1, 2z2, 2z3 (step Sn1 processing). The processing performed using the time slot selection information is processing including linear prediction synthesis filter processing in the frequency direction among the processing in the individual signal component adjustment units 2z1, 2z2, and 2z3 described in Modification 3 of the fourth embodiment. It is desirable to include at least one of them.

個別信号成分調整部２ｚ４，２ｚ５，２ｚ６における処理は、前記第４の実施形態の変形例３に記載の個別信号成分調整部２ｚ１，２ｚ２，２ｚ３の処理と同様に、互いに同じでもよいが、個別信号成分調整部２ｚ４，２ｚ５，２ｚ６は、一次高周波調整部の出力に含まれる複数の信号成分の各々に対し互いに異なる方法で時間エンベロープの変形を行ってもよい。（個別信号成分調整部２ｚ４，２ｚ５，２ｚ６の全てが時間スロット選択部３ａより通知された選択結果に基づいて処理しない場合は、本発明の第４の実施形態の変形例３と同等になる）。 The processing in the individual signal component adjustment units 2z4, 2z5, and 2z6 may be the same as the processing of the individual signal component adjustment units 2z1, 2z2, and 2z3 described in the third modification of the fourth embodiment. The signal component adjustment units 2z4, 2z5, and 2z6 may perform time envelope transformation on each of a plurality of signal components included in the output of the primary high frequency adjustment unit using different methods. (If all of the individual signal component adjustment units 2z4, 2z5, and 2z6 are not processed based on the selection result notified from the time slot selection unit 3a, this is equivalent to the third modification of the fourth embodiment of the present invention). .

時間スロット選択部３ａから個別信号成分調整部２ｚ４，２ｚ５，２ｚ６のそれぞれに通知される時間スロットの選択結果は、必ずしも全てが同じである必要はなく、全てまたは一部が異なってもよい。 The time slot selection results notified from the time slot selection unit 3a to each of the individual signal component adjustment units 2z4, 2z5, and 2z6 do not necessarily have to be the same, and may be all or a part of them.

さらに、図４０ではひとつの時間スロット選択部３ａから個別信号成分調整部２ｚ４，２ｚ５，２ｚ６のそれぞれに時間スロットの選択結果を通知する構成になっているが、個別信号成分調整部２ｚ４，２ｚ５，２ｚ６のそれぞれ、または一部に対して異なる時間スロットの選択結果を通知する時間スロット選択部を複数有してもよい。またその際に、個別信号成分調整部２ｚ４，２ｚ５，２ｚ６のうち、第４の実施形態の変形例３に記載の処理4（入力信号に対して時間エンベロープ変形部２ｖと同様の、エンベロープ形状調整部２ｓから得られた時間エンベロープを用いて各ＱＭＦサブバンドサンプルへゲイン係数を乗算する処理を行った後、その出力信号に対してさらに線形予測フィルタ部２ｋと同様の、フィルタ強度調整部２ｆから得られた線形予測係数を用いた周波数方向の線形予測合成フィルタ処理）を行う個別信号成分調整部に対する時間スロット選択部は、時間エンベロープ変形部から時間スロット選択情報を入力されて時間スロットの選択処理を行ってもよい。 Further, in FIG. 40, the time slot selection unit 3a notifies the individual signal component adjustment units 2z4, 2z5, and 2z6 of the selection result of the time slot, but the individual signal component adjustment units 2z4, 2z5, A plurality of time slot selectors may be provided for notifying the result of selecting different time slots for each or a part of 2z6. At that time, among the individual signal component adjustment units 2z4, 2z5, and 2z6, the process 4 described in Modification 3 of the fourth embodiment (envelope shape adjustment similar to the time envelope modification unit 2v with respect to the input signal) After performing a process of multiplying each QMF subband sample by a gain coefficient using the time envelope obtained from the unit 2s, the output signal is further filtered from the filter strength adjustment unit 2f similar to the linear prediction filter unit 2k. The time slot selection unit for the individual signal component adjustment unit that performs frequency direction linear prediction synthesis filter processing using the obtained linear prediction coefficient) receives the time slot selection information from the time envelope transformation unit, and performs time slot selection processing May be performed.

（第４の実施形態の変形例１３）
第４の実施形態の変形例１３の音声復号装置２４ｍ（図４２参照）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２４ｍの内蔵メモリに格納された所定のコンピュータプログラム（例えば、図４３のフローチャートに示す処理を行うためのコンピュータプログラム）をＲＡＭにロードして実行することによって音声復号装置２４ｍを統括的に制御する。音声復号装置２４ｍの通信装置は、符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。音声復号装置２４ｍは、図４２に示すとおり、変形例１２の音声復号装置２４ｑのビットストリーム分離部２ａ３、及び時間スロット選択部３ａにかえて、ビットストリーム分離部２ａ７、及び時間スロット選択部３ａ１を備える。 (Modification 13 of the fourth embodiment)
The speech decoding device 24m (see FIG. 42) of Modification 13 of the fourth embodiment includes a CPU, a ROM, a RAM, a communication device, and the like that are not shown physically, and this CPU is a speech decoding device 24m such as a ROM. A predetermined computer program (for example, a computer program for performing the processing shown in the flowchart of FIG. 43) stored in the built-in memory is loaded into the RAM and executed to control the speech decoding device 24m in an integrated manner. The communication device of the audio decoding device 24m receives the encoded multiplexed bit stream, and further outputs the decoded audio signal to the outside. As shown in FIG. 42, the audio decoding device 24m replaces the bit stream separation unit 2a3 and the time slot selection unit 3a of the audio decoding device 24q of Modification 12 with a bit stream separation unit 2a7 and a time slot selection unit 3a1. Prepare.

（第4の実施形態の変形例１４）
第４の実施形態の変形例１４の音声復号装置２４ｎ（不図示）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２４ｎの内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって音声復号装置２４ｎを統括的に制御する。音声復号装置２４ｎの通信装置は、符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。音声復号装置２４ｎは、機能的には、変形例１の音声復号装置２４ａの低周波線形予測分析部２ｄ、信号変化検出部２ｅ、高周波線形予測分析部２ｈ、線形予測逆フィルタ部２ｉ、及び線形予測フィルタ部２ｋにかえて、低周波線形予測分析部２ｄ１、信号変化検出部２ｅ１、高周波線形予測分析部２ｈ１、線形予測逆フィルタ部２ｉ１、及び線形予測フィルタ部２ｋ３を備え、時間スロット選択部３ａをさらに備える。 (Modification 14 of the fourth embodiment)
A speech decoding device 24n (not shown) of Modification 14 of the fourth embodiment includes a CPU, a ROM, a RAM, a communication device, and the like that are not physically shown, and this CPU includes a speech decoding device 24n such as a ROM. A predetermined computer program stored in the built-in memory is loaded into the RAM and executed, thereby comprehensively controlling the speech decoding device 24n. The communication device of the audio decoding device 24n receives the encoded multiplexed bit stream, and further outputs the decoded audio signal to the outside. The speech decoding device 24n functionally includes a low frequency linear prediction analysis unit 2d, a signal change detection unit 2e, a high frequency linear prediction analysis unit 2h, a linear prediction inverse filter unit 2i, and a linear configuration of the speech decoding device 24a of the first modification. Instead of the prediction filter unit 2k, a low frequency linear prediction analysis unit 2d1, a signal change detection unit 2e1, a high frequency linear prediction analysis unit 2h1, a linear prediction inverse filter unit 2i1, and a linear prediction filter unit 2k3 are provided, and a time slot selection unit 3a Is further provided.

（第４の実施形態の変形例１５）
第４の実施形態の変形例１５の音声復号装置２４ｐ（不図示）は、物理的には図示しないＣＰＵ、ＲＯＭ、ＲＡＭ及び通信装置等を備え、このＣＰＵは、ＲＯＭ等の音声復号装置２４ｐの内蔵メモリに格納された所定のコンピュータプログラムをＲＡＭにロードして実行することによって音声復号装置２４ｐを統括的に制御する。音声復号装置２４ｐの通信装置は、符号化された多重化ビットストリームを受信し、更に、復号した音声信号を外部に出力する。音声復号装置２４ｐは、機能的には、変形例1４の音声復号装置２４ｎの時間スロット選択部３ａにかえて、時間スロット選択部３ａ１を備える。さらに、ビットストリーム分離部２ａ４にかえて、ビットストリーム分離部２ａ８（不図示）を備える。 (Modification 15 of the fourth embodiment)
A speech decoding device 24p (not shown) of Modification 15 of the fourth embodiment includes a CPU, a ROM, a RAM, a communication device, and the like that are not physically shown. The voice decoding device 24p is controlled in an integrated manner by loading a predetermined computer program stored in the built-in memory into the RAM and executing it. The communication device of the audio decoding device 24p receives the encoded multiplexed bit stream, and further outputs the decoded audio signal to the outside. The speech decoding device 24p functionally includes a time slot selection unit 3a1 instead of the time slot selection unit 3a of the speech decoding device 24n of the modification example 14. Further, a bit stream separation unit 2a8 (not shown) is provided instead of the bit stream separation unit 2a4.

ビットストリーム分離部２ａ８は、ビットストリーム分離部２ａ４と同様に、多重化ビットストリームを、ＳＢＲ補助情報と、符号化ビットストリームとに分離し、さらに時間スロット選択情報とに分離する。 Similarly to the bit stream separation unit 2a4, the bit stream separation unit 2a8 separates the multiplexed bit stream into SBR auxiliary information and encoded bit stream, and further separates into time slot selection information.

１１，１１ａ，１１ｂ，１１ｃ，１２，１２ａ，１２ｂ，１３，１４、１４ａ，１４ｂ…音声符号化装置、１ａ…周波数変換部、１ｂ…周波数逆変換部、１ｃ…コアコーデック符号化部、１ｄ…ＳＢＲ符号化部、１ｅ，１ｅ１…線形予測分析部、１ｆ…フィルタ強度パラメータ算出部、１ｆ１…フィルタ強度パラメータ算出部、１ｇ，１ｇ１，１ｇ２，１ｇ３，１ｇ４，１ｇ５，１ｇ６，１ｇ７…ビットストリーム多重化部、１ｈ…高周波周波数逆変換部、１ｉ…短時間電力算出部、１ｊ…線形予測係数間引き部、１ｋ…線形予測係数量子化部、１ｍ…時間エンベロープ算出部、１ｎ…エンベロープ形状パラメータ算出部、１ｐ、１p１…時間スロット選択部、２１，２２，２３，２４，２４ｂ，２４ｃ…音声復号装置、２ａ，２ａ１，２ａ２，２ａ３，２ａ５，２ａ６，２ａ７…ビットストリーム分離部、２ｂ…コアコーデック復号部、２ｃ…周波数変換部、２ｄ，２ｄ１…低周波線形予測分析部、２ｅ，２ｅ１…信号変化検出部、２ｆ…フィルタ強度調整部、２ｇ…高周波生成部、２ｈ，２ｈ１…高周波線形予測分析部、２ｉ，２ｉ１…線形予測逆フィルタ部、２ｊ，２ｊ１，２ｊ２，２ｊ３，２ｊ４…高周波調整部、２ｋ，２ｋ１，２ｋ２，２ｋ３…線形予測フィルタ部、２ｍ…係数加算部、２ｎ…周波数逆変換部、２ｐ，２ｐ１…線形予測係数補間・補外部、２ｒ…低周波時間エンベロープ計算部、２ｓ…エンベロープ形状調整部、２ｔ…高周波時間エンベロープ算出部、２ｕ…時間エンベロープ平坦化部、２ｖ，２ｖ１…時間エンベロープ変形部、２ｗ…補助情報変換部、２ｚ１，２ｚ２，２ｚ３，２ｚ４，２ｚ５，２ｚ６…個別信号成分調整部、３ａ，３ａ１，３ａ２…時間スロット選択部。 11, 11 a, 11 b, 11 c, 12, 12 a, 12 b, 13, 14, 14 a, 14 b... Speech encoding device, 1 a... Frequency converter, 1 b. SBR encoding unit, 1e, 1e1 ... linear prediction analysis unit, 1f ... filter strength parameter calculation unit, 1f1 ... filter strength parameter calculation unit, 1g, 1g1, 1g2, 1g3, 1g4, 1g5, 1g6, 1g7 ... bitstream multiplexing 1h: high frequency frequency inverse transform unit, 1i ... short time power calculation unit, 1j ... linear prediction coefficient thinning unit, 1k ... linear prediction coefficient quantization unit, 1m ... temporal envelope calculation unit, 1n ... envelope shape parameter calculation unit, 1p, 1p1... Time slot selection unit, 21, 22, 23, 24, 24b, 24c... Speech decoding device, 2a, 2a1, 2a2 , 2a3, 2a5, 2a6, 2a7 ... bit stream separating unit, 2b ... core codec decoding unit, 2c ... frequency converting unit, 2d, 2d1 ... low frequency linear prediction analysis unit, 2e, 2e1 ... signal change detecting unit, 2f ... filter Intensity adjustment unit, 2g ... high frequency generation unit, 2h, 2h1 ... high frequency linear prediction analysis unit, 2i, 2i1 ... linear prediction inverse filter unit, 2j, 2j1, 2j2, 2j3, 2j4 ... high frequency adjustment unit, 2k, 2k1, 2k2, 2k3 ... linear prediction filter unit, 2m ... coefficient addition unit, 2n ... frequency inverse transformation unit, 2p, 2p1 ... linear prediction coefficient interpolation / external, 2r ... low frequency time envelope calculation unit, 2s ... envelope shape adjustment unit, 2t ... High frequency time envelope calculation unit, 2u... Time envelope flattening unit, 2v, 2v1... Time envelope transformation unit, 2w. Z1,2z2,2z3,2z4,2z5,2z6 ... individual signal component adjuster, 3a, 3a1 and 3a2 ... time slot selection unit.

Claims

An audio decoding device for decoding an encoded audio signal,
Bitstream separation means for separating an external bitstream including the encoded audio signal into an encoded bitstream and time envelope auxiliary information;
Core decoding means for decoding the encoded bitstream separated by the bitstream separation means to obtain a low frequency component;
Frequency converting means for converting the low frequency component obtained by the core decoding means into a frequency domain;
High frequency generation means for generating a high frequency component by copying the low frequency component converted into the frequency domain by the frequency conversion means from a low frequency band to a high frequency band;
Primary high-frequency adjusting means for generating an output signal by executing a part of processing including gain adjustment, noise superimposition, and sine wave addition processing on the high-frequency component generated by the high-frequency generating means When,
Low frequency time envelope analyzing means for analyzing the low frequency component converted into the frequency domain by the frequency converting means to obtain time envelope information;
Auxiliary information converting means for converting the time envelope auxiliary information into a parameter for adjusting the time envelope information;
Time envelope adjusting means for adjusting the time envelope information acquired by the low frequency time envelope analyzing means to generate adjusted time envelope information, and using the parameter for adjusting the time envelope information An envelope adjusting means;
Using the adjusted time envelope information, deforming the time envelope of the output signal generated by the primary high-frequency adjusting means, and generating an output signal;
Secondary high frequency adjustment means for performing other part of the processing including gain adjustment, noise superimposition, and sine wave addition processing on the output signal generated by the time envelope deformation means;
A speech decoding apparatus comprising:

An audio decoding device for decoding an encoded audio signal,
Core decoding means for decoding a bitstream from the outside including the encoded audio signal to obtain a low frequency component;
Frequency converting means for converting the low frequency component obtained by the core decoding means into a frequency domain;
High frequency generation means for generating a high frequency component by copying the low frequency component converted into the frequency domain by the frequency conversion means from a low frequency band to a high frequency band;
Primary high-frequency adjusting means for generating an output signal by executing a part of processing including gain adjustment, noise superimposition, and sine wave addition processing on the high-frequency component generated by the high-frequency generating means When,
Low frequency time envelope analyzing means for analyzing the low frequency component converted into the frequency domain by the frequency converting means to obtain time envelope information;
A time envelope auxiliary information generator for analyzing the bitstream and generating parameters for adjusting the time envelope information;
Time envelope adjusting means for adjusting the time envelope information acquired by the low frequency time envelope analyzing means to generate adjusted time envelope information, and using the parameter for adjusting the time envelope information An envelope adjusting means;
Using the adjusted time envelope information, deforming the time envelope of the output signal generated by the primary high-frequency adjusting means, and generating an output signal;
Secondary high frequency adjustment means for performing other part of the processing including gain adjustment, noise superimposition, and sine wave addition processing on the output signal generated by the time envelope deformation means;
A speech decoding apparatus comprising:

3. The speech decoding apparatus according to claim 1, wherein the secondary high-frequency adjusting unit performs the addition process of the sine wave in an SBR decoding process on the output signal generated by the time envelope deforming unit. .

A speech decoding method using a speech decoding device that decodes an encoded speech signal,
A bitstream separation step in which the speech decoding apparatus separates an external bitstream including the encoded speech signal into an encoded bitstream and time envelope auxiliary information;
A core decoding step in which the speech decoding apparatus obtains a low-frequency component by decoding the encoded bitstream separated in the bitstream separation step;
A frequency conversion step in which the speech decoding apparatus converts the low frequency component obtained in the core decoding step into a frequency domain;
A high frequency generation step in which the speech decoding apparatus generates a high frequency component by copying the low frequency component converted into the frequency domain in the frequency conversion step from a low frequency band to a high frequency band;
The speech decoding apparatus generates an output signal by executing a part of processing including gain adjustment, noise superimposition, and sine wave addition processing on the high-frequency component generated in the high-frequency generation step. A primary high frequency adjustment step,
A low-frequency time envelope analysis step in which the speech decoding apparatus acquires time envelope information by analyzing the low-frequency component converted into the frequency domain in the frequency conversion step;
An auxiliary information converting step in which the speech decoding apparatus converts the time envelope auxiliary information into a parameter for adjusting the time envelope information;
The speech decoding apparatus is a time envelope adjustment step of adjusting the time envelope information acquired in the low frequency time envelope analysis step to generate adjusted time envelope information, and the parameter is used to adjust the time envelope information. Using the time envelope adjustment step;
The speech decoding apparatus uses the adjusted time envelope information to modify the time envelope of the output signal generated in the primary high frequency adjustment step, thereby generating an output signal;
The speech decoding apparatus performs secondary part of the processing including gain adjustment, noise superimposition, and sine wave addition processing on the output signal generated in the time envelope transformation step. A high frequency adjustment step;
A speech decoding method including:

A speech decoding method using a speech decoding device that decodes an encoded speech signal,
A core decoding step in which the speech decoding apparatus obtains a low-frequency component by decoding an external bitstream including the encoded speech signal;
A frequency conversion step in which the speech decoding apparatus converts the low frequency component obtained in the core decoding step into a frequency domain;
A high frequency generation step in which the speech decoding apparatus generates a high frequency component by copying the low frequency component converted into the frequency domain in the frequency conversion step from a low frequency band to a high frequency band;
The speech decoding apparatus generates an output signal by executing a part of processing including gain adjustment, noise superimposition, and sine wave addition processing on the high-frequency component generated in the high-frequency generation step. A primary high frequency adjustment step,
A low-frequency time envelope analysis step in which the speech decoding device acquires time envelope information by analyzing the low-frequency component converted into the frequency domain in the frequency conversion step;
A time envelope auxiliary information generating step in which the speech decoding device generates a parameter for analyzing the bitstream and adjusting the time envelope information;
The speech decoding apparatus is a time envelope adjustment step of adjusting the time envelope information acquired in the low frequency time envelope analysis step to generate adjusted time envelope information, and the parameter is used to adjust the time envelope information. Using the time envelope adjustment step;
The speech decoding apparatus uses the adjusted time envelope information to modify the time envelope of the output signal generated in the primary high frequency adjustment step, thereby generating an output signal;
The speech decoding apparatus performs secondary part of the processing including gain adjustment, noise superimposition, and sine wave addition processing on the output signal generated in the time envelope transformation step. A high frequency adjustment step;
A speech decoding method including:

In order to decode the encoded audio signal, a computer device is
Bitstream separation means for separating an external bitstream including the encoded audio signal into an encoded bitstream and time envelope auxiliary information;
Core decoding means for decoding the encoded bitstream separated by the bitstream separation means to obtain a low frequency component;
Frequency converting means for converting the low frequency component obtained by the core decoding means into a frequency domain;
High frequency generation means for generating a high frequency component by copying the low frequency component converted into the frequency domain by the frequency conversion means from a low frequency band to a high frequency band;
Primary high-frequency adjusting means for generating an output signal by executing a part of processing including gain adjustment, noise superimposition, and sine wave addition processing on the high-frequency component generated by the high-frequency generating means When,
Low frequency time envelope analyzing means for analyzing the low frequency component converted into the frequency domain by the frequency converting means to obtain time envelope information;
Auxiliary information converting means for converting the time envelope auxiliary information into a parameter for adjusting the time envelope information;
Time envelope adjusting means for adjusting the time envelope information acquired by the low frequency time envelope analyzing means to generate adjusted time envelope information, and using the parameter for adjusting the time envelope information An envelope adjusting means;
Using the adjusted time envelope information, deforming the time envelope of the output signal generated by the primary high-frequency adjusting means, and generating an output signal;
Secondary high frequency adjustment means for performing other part of the processing including gain adjustment, noise superimposition, and sine wave addition processing on the output signal generated by the time envelope deformation means;
Voice decoding program that functions as

In order to decode the encoded audio signal, a computer device is
Core decoding means for decoding a bitstream from the outside including the encoded audio signal to obtain a low frequency component;
Frequency converting means for converting the low frequency component obtained by the core decoding means into a frequency domain;
High frequency generation means for generating a high frequency component by copying the low frequency component converted into the frequency domain by the frequency conversion means from a low frequency band to a high frequency band;
Primary high-frequency adjusting means for generating an output signal by executing a part of processing including gain adjustment, noise superimposition, and sine wave addition processing on the high-frequency component generated by the high-frequency generating means When,
Low frequency time envelope analyzing means for analyzing the low frequency component converted into the frequency domain by the frequency converting means to obtain time envelope information;
A time envelope auxiliary information generator for analyzing the bitstream and generating parameters for adjusting the time envelope information;
Time envelope adjusting means for adjusting the time envelope information acquired by the low frequency time envelope analyzing means to generate adjusted time envelope information, and using the parameter for adjusting the time envelope information An envelope adjusting means;
Using the adjusted time envelope information, deforming the time envelope of the output signal generated by the primary high-frequency adjusting means, and generating an output signal;
Secondary high frequency adjustment means for performing other part of the processing including gain adjustment, noise superimposition, and sine wave addition processing on the output signal generated by the time envelope deformation means;
Voice decoding program to function as.