JP2004272292A

JP2004272292A - Sound signal processing method

Info

Publication number: JP2004272292A
Application number: JP2004158788A
Authority: JP
Inventors: Hirohisa Tazaki; 裕久田崎
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1997-12-08
Filing date: 2004-05-28
Publication date: 2004-09-30
Anticipated expiration: 2018-12-07
Also published as: JP4230414B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method and an apparatus to process the inputted sound signals which include degraded sound such as quantizing noise so that the degraded sound is subjectively unperceptible. <P>SOLUTION: The spectrum of a decoded sound after performing auditory weighting to the decoded sound as the input sound signals, is computed and transformation strength is calculated on the basis of the magnitude of the amplitude and the continuity of the spectrum. In a signal transformation section, the spectrum of the decoded sound is obtained, amplitude smoothing and phase disturbance adding are conducted on the basis of the transformation strength and the spectrum is returned to a signal region to provide transformed and decoded voice. In a signal evaluation section, the decoded sound is analyzed to obtain background noise likeness and the obtained value is made an addition control value. In a weighted value adding section, the weight for adding to the decoded sound is reduced, the weight for adding to the transformed sound is increased, and an output sound is obtained, when the addition control value indicates the background noise likeness. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

本発明は、音声や楽音などの符号化復号化処理によって発生する量子化雑音や、雑音抑圧処理などのさまざまな信号加工処理によって生じる歪み、などの主観的に好ましくない成分を主観的に感じにくいように加工する音信号加工方法および音信号加工装置に関する。 The present invention makes it difficult to subjectively perceive subjectively unfavorable components such as quantization noise generated by encoding / decoding processes such as voices and musical sounds, distortion caused by various signal processing processes such as noise suppression processing, and the like. Signal processing method and a sound signal processing apparatus for performing the above processing.

音声や楽音などの情報源符号化の圧縮率を高めていくと、次第に符号化時の歪みである量子化雑音が増えてくるし、量子化雑音が変形してきて主観的に耐えられないものになってくる。一例を挙げて説明すると、ＰＣＭ（ＰｕｌｓｅＣｏｄｅＭｏｄｕｌａｔｉｏｎ）やＡＤＰＣＭ（ＡｄａｐｔｉｖｅＤｉｆｆｅｒｅｎｔｉａｌＰｕｌｓｅＣｏｄｅＭｏｄｕｌａｔｉｏｎ）のような信号自体を忠実に表現しようとする音声符号化方式の場合には、量子化雑音は乱数状であり、主観的にもあまり気にならないが、圧縮率が高まり、符号化方式が複雑になるにつれて、量子化雑音に符号化方式固有のスペクトル特性が表れ、主観的に大きな劣化となる場合がでてくる。特に背景雑音が支配的な信号区間においては、高圧縮率の音声符号化方式が利用している音声モデルが合わないため、非常に聞き苦しい音となってしまう。 As the compression rate of information source coding such as voice and musical sound is increased, quantization noise, which is distortion at the time of coding, gradually increases, and the quantization noise is deformed and becomes subjectively unbearable. Come. For example, in the case of a speech coding system that attempts to faithfully represent a signal itself, such as PCM (Pulse Code Modulation) and ADPCM (Adaptive Differential Pulse Code Modulation), the quantization noise is random. Yes, it does not matter much subjectively, but as the compression ratio increases and the coding system becomes more complex, the quantization noise may show spectral characteristics unique to the coding system, resulting in a subjectively large deterioration. Come. In particular, in a signal section in which background noise is dominant, a speech model used by a speech encoding scheme with a high compression rate does not match, resulting in a very hard-to-hear sound.

また、スペクトルサブトラクション法などの雑音抑圧処理を行った場合、雑音の推定誤差が処理後の信号上に歪みとして残り、これが処理前の信号と大きく異なる特性をもっているために、主観評価を大きく劣化させることがある。 In addition, when noise suppression processing such as a spectral subtraction method is performed, noise estimation errors remain as distortions on the processed signal, and have significantly different characteristics from the signal before the processing, which greatly deteriorates the subjective evaluation. Sometimes.

上記のような量子化雑音や歪みによる主観評価の低下を抑制する従来の方法としては、特開平８−１３０５１３号、特開平８−１４６９９８号、特開平７−１６０２９６号、特開平６−３２６６７０号、特開平７−２４８７９３号、およびＳ．Ｆ．Ｂｏｌｌ著ｒａｃｔｉｏｎＳＳＰ−２７，Ｎｏ．２，ｐｐ．１１３−１２０，Ａｐｒｉｌ１９７９）（以降文献１と呼ぶ）に開示されているものがある。 As a conventional method for suppressing the deterioration of the subjective evaluation due to the quantization noise and distortion as described above, JP-A-8-130513, JP-A-8-146998, JP-A-7-160296, and JP-A-6-326670 are disclosed. And JP-A-7-248793, and S.I. F. Boll, fractionSSP-27, No. 2, pp. 113-120, April 1979) (hereinafter referred to as Document 1).

特開平８−１３０５１３号は、背景雑音区間の品質改善を目的としたもので、背景雑音のみの区間であるか否かを判定して、背景雑音のみの区間に専用の符号化処理または復号化処理を行うようにし、背景雑音のみの区間の復号化を行う場合に合成フィルタの特性を抑制することで、聴感的に自然な再生音を得るようにしたものである。 Japanese Patent Application Laid-Open No. Hei 8-130513 aims to improve the quality of a background noise section, and determines whether or not the section is only a background noise section, and performs an encoding process or decoding dedicated to the section including only the background noise section. By performing the processing and suppressing the characteristic of the synthesis filter when decoding only the section of the background noise, an acoustically natural reproduced sound is obtained.

特開平８−１４６９９８号は、白色雑音が符号化復号化によって耳障りな音色になることを抑制することを狙って、復号音声に対して白色雑音や予め格納しておいた背景雑音を加えるようにしたものである。 Japanese Patent Application Laid-Open No. Hei 8-146998 discloses a technique of adding white noise or pre-stored background noise to decoded speech with the aim of suppressing white noise from becoming an unpleasant tone due to encoding and decoding. It was done.

特開平７−１６０２９６号は、量子化雑音を聴感的に低減することを狙って、復号音声または音声復号化部が受信したスペクトルパラメータに関するインデックスを基に、聴覚マスキング閾値を求め、これを反映したフィルタ係数を求めて、この係数をポストフィルタに使用するようにしたものである。 Japanese Patent Application Laid-Open No. 7-160296 seeks an auditory masking threshold based on an index related to a decoded speech or a spectrum parameter received by a speech decoding unit and aims to reduce quantization noise audibly, and reflects this. A filter coefficient is obtained, and this coefficient is used for a post filter.

特開平６−３２６６７０号は、通信電力制御などのために音声を含まない区間で符号伝送を停止するシステムでは、符号伝送の無い時には復号側で疑似背景雑音を生成して出力するが、この時に発生する、音声区間に含まれる実際の背景雑音と無音区間の疑似背景雑音の間の違和感を軽減することを狙ったもので、音声を含まない区間だけでなく音声区間にも疑似背景雑音を重畳するようにしたものである。 Japanese Unexamined Patent Publication No. Hei 6-326670 discloses a system in which code transmission is stopped in a section that does not include voice for communication power control or the like. In the absence of code transmission, pseudo background noise is generated and output on the decoding side. It aims to reduce the sense of discomfort between the actual background noise included in the voice section and the pseudo background noise in the silent section, and superimposes the pseudo background noise not only on the section containing no voice but also on the voice section. It is intended to be.

特開平７−２４８７９３号は、雑音抑圧処理によって発生する歪み音を聴感的に軽減することを目的としたもので、符号化側では、まず雑音区間か音声区間か判定し、雑音区間では雑音スペクトルを伝送し、音声区間では雑音抑圧処理後のスペクトルを伝送し、復号化側では、雑音区間では受信した雑音スペクトルを用いて合成音を生成して出力し、音声区間では受信した雑音抑圧処理後のスペクトルを用いて生成した合成音に、雑音区間で受信した雑音スペクトルを用いて生成した合成音に重畳倍率を乗じて加算して出力するようにしたものである。 Japanese Patent Application Laid-Open No. Hei 7-248793 aims to reduce audibly the distortion sound generated by the noise suppression processing. On the encoding side, first, it is determined whether the noise section is a speech section or a speech section. Is transmitted in the voice section, the spectrum after the noise suppression processing is transmitted, and the decoding side generates and outputs a synthesized sound using the received noise spectrum in the noise section. Is added to the synthesized sound generated by using the spectrum of (i) and the synthesized sound generated by using the noise spectrum received in the noise section, multiplied by the superimposition ratio, and output.

文献１は、雑音抑圧処理によって発生する歪み音を聴感的に軽減することを狙い、雑音抑圧処理後の出力音声に対して、時間的に前後の区間と振幅スペクトル上の平滑化を行い、更に背景雑音区間に限って振幅抑圧処理を行っている。
特開平８−１３０５１３号特開平８−１４６９９８号特開平７−１６０２９６号特開平６−３２６６７０号特開平７−２４８７９３号Ｓ．Ｆ．Ｂｏｌｌ著ｒａｃｔｉｏｎＳＳＰ−２７，Ｎｏ．２，ｐｐ．１１３−１２０，Ａｐｒｉｌ１９７９） Literature 1 aims to reduce the distorted sound generated by the noise suppression processing audibly, and performs smoothing on the temporally preceding and succeeding sections and the amplitude spectrum of the output voice after the noise suppression processing. The amplitude suppression processing is performed only in the background noise section.
JP-A-8-130513 JP-A-8-146998 JP-A-7-160296 JP-A-6-326670 JP-A-7-248793 S. F. Boll, fractionSSP-27, No. 2, pp. 113-120, April 1979).

上記の従来法には、以下に述べる課題がある。 The above conventional method has the following problems.

特開平８−１３０５１３号には、符号化処理や復号化処理を区間判定結果に従って大きく切り替えているために、雑音区間と音声区間の境界で特性の急変が起こる課題がある。特に雑音区間を音声区間と誤判定することが頻繁に起こった場合、本来比較的定常である雑音区間が不安定に変動してしまい、かえって雑音区間の劣化を起こす場合がある。雑音区間判定結果を伝送する場合、伝送するための情報の追加が必要で、更にその情報が伝送路上で誤った場合に、不必要な劣化を引き起こす課題がある。また、合成フィルタの特性を抑制するだけでは、音源符号化の際に生じる量子化雑音は軽減されないため、雑音種によっては改善効果がほとんど得られない課題がある。 Japanese Patent Application Laid-Open No. Hei 8-130513 has a problem that a sudden change in characteristics occurs at a boundary between a noise section and a speech section because encoding processing and decoding processing are largely switched according to the section determination result. In particular, when erroneous determination of a noise section as a voice section frequently occurs, the noise section which is originally relatively stationary fluctuates in an unstable manner, and may rather deteriorate the noise section. When transmitting the noise section determination result, it is necessary to add information to be transmitted, and there is a problem that when the information is erroneous on a transmission path, unnecessary deterioration is caused. Further, simply suppressing the characteristics of the synthesis filter does not reduce the quantization noise generated at the time of excitation coding, so that there is a problem that an improvement effect is hardly obtained depending on the type of noise.

特開平８−１４６９９８号には、予め用意してある雑音を加えてしてしまうために、符号化された現在の背景雑音の特性が失われてしまう課題がある。劣化音を聞こえにくくするためには劣化音を上回るレベルの雑音を加える必要があり、再生される背景雑音が大きくなってしまう課題がある。 Japanese Patent Application Laid-Open No. 8-146998 has a problem that the characteristic of the current coded background noise is lost because noise prepared in advance is added. In order to make the degraded sound difficult to hear, it is necessary to add noise at a level higher than the degraded sound, and there is a problem that the reproduced background noise increases.

特開平７−１６０２９６号では、スペクトルパラメータに基づいて聴覚マスキング閾値を求めて、これに基づいてスペクトルポストフィルタを行うだけであるので、スペクトルが比較的平坦な背景雑音などでは、マスキングされる成分もほとんどなく、全く改善効果が得られない課題がある。また、マスキングされない主要成分については、大きな変化を与えることができないので、主要成分に含まれている歪みについては何らの改善効果も得られない課題がある。 In Japanese Patent Application Laid-Open No. 7-160296, an auditory masking threshold is obtained based on spectral parameters, and a spectrum post-filter is simply performed based on the threshold. There is a problem that there is hardly any improvement effect. In addition, since a large change cannot be given to a main component that is not masked, there is a problem that no improvement effect can be obtained for distortion included in the main component.

特開平６−３２６６７０号では、実際の背景雑音に関係なく疑似背景雑音を生成しているので、実際の背景雑音の特性が失われてしまう課題がある。 In JP-A-6-326670, since the pseudo background noise is generated regardless of the actual background noise, there is a problem that the characteristics of the actual background noise are lost.

特開平７−２４８７９３号には、符号化処理や復号化処理を区間判定結果に従って大きく切り替えているために、雑音区間か音声区間かの判定を誤ると大きな劣化を引き起こす課題がある。雑音区間の一部を音声区間と誤った場合には、雑音区間内の音質が不連続に変動して聞き苦しくなる。逆に音声区間を雑音区間と誤った場合には、平均雑音スペクトルを用いた雑音区間の合成音と、音声区間で重畳される雑音スペクトルを用いた合成音に音声成分が混入し、全体的に音質劣化が起こる課題がある。更に、音声区間における劣化音を聞こえなくするためには、決して小さくない雑音を重畳することが必要である。 Japanese Patent Laid-Open No. Hei 7-248793 has a problem that since the encoding process and the decoding process are largely switched in accordance with the section determination result, erroneous determination of a noise section or a voice section causes a large deterioration. If a part of the noise section is mistaken for a voice section, the sound quality in the noise section varies discontinuously, making it difficult to hear. Conversely, if the speech section is mistaken for a noise section, speech components are mixed into the synthesized sound of the noise section using the average noise spectrum and the synthesized sound using the noise spectrum superimposed in the speech section, and There is a problem that sound quality degradation occurs. Furthermore, in order to make the degraded sound in the voice section inaudible, it is necessary to superimpose noise that is not low.

文献１には、平滑化のために半区間分（１０ｍｓ〜２０ｍｓ程度）の処理遅延が発生する課題がある。また、雑音区間内の一部を音声区間と誤判定してしまった場合、雑音区間内の音質が不連続に変動して聞き苦しくなる課題がある。 Literature 1 has a problem that a processing delay of a half section (about 10 ms to 20 ms) occurs due to smoothing. In addition, when a part of the noise section is erroneously determined to be a speech section, there is a problem that the sound quality in the noise section varies discontinuously, making it difficult to hear.

この発明は、かかる課題を解決するためになされたものであり、区間判定誤りによる劣化が少なく、雑音種やスペクトル形状への依存度が少なく、大きな遅延時間を必要としない、実際の背景雑音の特性を残すことができ、背景雑音レベルを過度に大きくすることがなく、新たな伝送情報の追加が不要で、音源符号化などによる劣化成分についても良好な抑圧効果を与えることのできる音信号加工方法および音信号加工装置を提供することを目的としている。 The present invention has been made in order to solve such a problem, and there is little deterioration due to a section determination error, little dependence on a noise type or a spectrum shape, and a large delay time is not required. Sound signal processing that can retain the characteristics, does not excessively increase the background noise level, does not require the addition of new transmission information, and can provide a good suppression effect even for components degraded by excitation coding etc. It is an object to provide a method and a sound signal processing device.

入力音信号を加工して第一の加工信号を生成し、前記入力音信号を分析して所定の評価値を算出し、この評価値に基づいて前記入力音信号と前記第一の加工信号を重み付け加算して第二の加工信号とし、この第二の加工信号を出力信号とすることを特徴とする。 The input sound signal is processed to generate a first processed signal, the input sound signal is analyzed to calculate a predetermined evaluation value, and the input sound signal and the first processed signal are calculated based on the evaluation value. The second processing signal is obtained by weighting and adding, and the second processing signal is used as an output signal.

また、更に、前記第一の加工信号生成方法は、前記入力音信号をフーリエ変換することで周波数毎のスペクトル成分を算出し、このフーリエ変換により算出された周波数毎のスペクトル成分に対して所定の変形を与え、変形後のスペクトル成分を逆フーリエ変換して生成することを特徴とする。 Further, the first processed signal generation method calculates a spectral component for each frequency by performing a Fourier transform on the input sound signal, and a predetermined spectral component for each frequency calculated by the Fourier transform. It is characterized in that a transform is given and the transformed spectral component is generated by performing an inverse Fourier transform.

また、更に、前記重み付け加算をスペクトル領域で行なうようにしたことを特徴とする。 Further, the weighted addition is performed in a spectral domain.

また、更に、前記重み付け加算を周波数成分毎に独立に制御するようにしたことを特徴とする。 Further, the weighted addition is controlled independently for each frequency component.

また、更に、前記周波数毎のスペクトル成分に対する所定の変形に振幅スペクトル成分の平滑化処理を含むことを特徴とする。 Further, the predetermined deformation of the spectrum component for each frequency includes a smoothing process of an amplitude spectrum component.

また、更に、前記周波数毎のスペクトル成分に対する所定の変形に位相スペクトル成分の擾乱付与処理を含むことを特徴とする。 Furthermore, the predetermined deformation of the spectrum component for each frequency includes a disturbance imparting process of a phase spectrum component.

また、更に、前記平滑化処理における平滑化強度を、入力音信号の振幅スペクトル成分の大きさによって制御するようにしたことを特徴とする。 Further, the smoothing strength in the smoothing processing is controlled by the magnitude of the amplitude spectrum component of the input sound signal.

また、更に、前記擾乱付与処理における擾乱付与強度を、入力音信号の振幅スペクトル成分の大きさによって制御するようにしたことを特徴とする。 Further, the present invention is characterized in that the disturbance imparting strength in the disturbance imparting process is controlled by the magnitude of the amplitude spectrum component of the input sound signal.

また、更に、前記平滑化処理における平滑化強度を、入力音信号のスペクトル成分の時間方向の連続性の大きさによって制御するようにしたことを特徴とする。 Further, the smoothing strength in the smoothing process is controlled by the magnitude of the continuity of the spectral components of the input sound signal in the time direction.

また、更に、前記擾乱付与処理における擾乱付与強度を、入力音信号のスペクトル成分の時間方向の連続性の大きさによって制御するようにしたことを特徴とする。 Further, the disturbance imparting strength in the disturbance imparting process is controlled by the magnitude of the continuity of the spectral components of the input sound signal in the time direction.

また、更に、前記入力音信号として、聴覚重み付した入力音信号を用いるようにしたことを特徴とする。 Further, an input sound signal weighted by auditory sense is used as the input sound signal.

また、更に、前記平滑化処理における平滑化強度を、前記評価値の時間変動性の大きさによって制御するようにしたことを特徴とする。 Further, the smoothing strength in the smoothing process is controlled by the magnitude of the time variability of the evaluation value.

また、更に、前記擾乱付与処理における擾乱付与強度を、前記評価値の時間変動性の大きさによって制御するようにしたことを特徴とする。 Further, the disturbance imparting intensity in the disturbance imparting process is controlled by the magnitude of the time variability of the evaluation value.

また、更に、前記所定の評価値として、前記入力音信号を分析して算出した背景雑音らしさの度合を用いるようにしたことを特徴とする。 Further, a characteristic of the background noise calculated by analyzing the input sound signal is used as the predetermined evaluation value.

また、更に、前記所定の評価値として、前記入力音信号を分析して算出した摩擦音らしさの度合を用いるようにしたことを特徴とする。 Further, the method is characterized in that a degree of fricativeness calculated by analyzing the input sound signal is used as the predetermined evaluation value.

また、更に、前記入力音信号として、音声符号化処理によって生成された音声符号を復号した復号音声を用いるようにしたことを特徴とする。 Further, a decoded speech obtained by decoding a speech code generated by a speech encoding process is used as the input sound signal.

この発明音信号加工方法は、前記入力音信号を音声符号化処理によって生成された音声符号を復号した第一の復号音声とし、この第一の復号音声に対してポストフィルタ処理を行なって第二の復号音声を生成し、前記第一の復号音声を加工して第一の加工音声を生成し、いずれかの復号音声を分析して所定の評価値を算出し、この評価値に基づいて前記第二の復号音声と前記第一の加工音声を重み付けし加算して第二の加工音声とし、この第二の加工音声を出力音声として出力することを特徴とする。 In the sound signal processing method of the present invention, the input sound signal is used as a first decoded sound obtained by decoding a sound code generated by a sound coding process, and the first decoded sound is subjected to a post-filter process to perform a second filtering. The first decoded voice is generated to generate a first processed voice, a decoded voice is analyzed to calculate a predetermined evaluation value, and based on the evaluation value, A second decoded audio and the first processed audio are weighted and added to form a second processed audio, and the second processed audio is output as an output audio.

この発明の音信号加工装置は、入力音信号を加工して第一の加工信号を生成する第一の加工信号生成部と、前記入力音信号を分析して所定の評価値を算出する評価値算出部と、この評価値算出部の評価値に基づいて前記入力音信号と前記第一の加工信号を重み付けして加算し、第二の加工信号として出力する第二の加工信号生成部とを備えたことを特徴とする。 A sound signal processing device according to the present invention includes a first processed signal generation unit that processes an input sound signal to generate a first processed signal, and an evaluation value that analyzes the input sound signal and calculates a predetermined evaluation value. A second processing signal generator that weights and adds the input sound signal and the first processing signal based on the evaluation value of the evaluation value calculator and outputs the second processing signal as a second processing signal. It is characterized by having.

また、更に、前記第一の加工信号生成部は、前記入力音信号をフーリエ変換することで周波数毎のスペクトル成分を算出し、この算出された周波数毎のスペクトル成分に対して振幅スペクトル成分の平滑化処理を与え、この振幅スペクトル成分の平滑化処理された後のスペクトル成分を逆フーリエ変換して第一の加工信号を生成することを特徴とする。 Further, the first processed signal generation unit calculates a spectrum component for each frequency by performing a Fourier transform on the input sound signal, and smoothes an amplitude spectrum component with respect to the calculated spectrum component for each frequency. And a first processing signal is generated by performing an inverse Fourier transform on the spectrum component after the smoothing process of the amplitude spectrum component.

また、更に、前記第一の加工信号生成部は、前記入力音信号をフーリエ変換することで周波数毎のスペクトル成分を算出し、この算出された周波数毎のスペクトル成分に対して位相スペクトル成分の擾乱付与処理を与え、この位相スペクトル成分の擾乱付与処理された後のスペクトル成分を逆フーリエ変換して第一の加工信号を生成することを特徴とする。 Further, the first processed signal generation unit calculates a spectral component for each frequency by performing a Fourier transform on the input sound signal, and disturbs a phase spectral component with respect to the calculated spectral component for each frequency. An application process is performed, and the spectrum component after the disturbance application process of the phase spectrum component is subjected to inverse Fourier transform to generate a first processed signal.

以上説明したように本発明の音信号加工方法および音信号加工装置は、入力信号に対して所定の信号加工処理を行うことで、入力信号に含まれる劣化成分を主観的に気にならないようにした加工信号を生成し、所定の評価値によって入力信号と加工信号の加算重みを制御するようにしたので、劣化成分が多く含まれる区間を中心に加工信号の比率を増やして、主観品質を改善できる効果がある。 As described above, the sound signal processing method and the sound signal processing apparatus of the present invention perform a predetermined signal processing process on an input signal so that a deterioration component included in the input signal is not subjectively noticed. Generated processing signal, and the addition weight of the input signal and the processing signal is controlled by a predetermined evaluation value, so that the ratio of the processing signal is increased mainly in the section where many degraded components are included, and the subjective quality is improved. There is an effect that can be done.

また、従来の２値区間判定を廃し、連続量の評価値を算出して、これに基づいて連続的に入力信号と加工信号の重み付け加算係数を制御できるので、区間判定誤りによる品質劣化を回避できる効果がある。 In addition, the conventional binary interval determination is eliminated, and the continuous value evaluation value is calculated. Based on this, the weighted addition coefficient of the input signal and the processed signal can be controlled continuously, so that quality deterioration due to an interval determination error is avoided. There is an effect that can be done.

また、背景雑音の情報が多く含まれている入力信号の加工処理によって出力信号を生成できるので、実際の背景雑音の特性を残しつつ、雑音種やスペクトル形状にあまり依存しない安定な品質改善効果が得られるし、音源符号化などによる劣化成分に対しても改善効果が得られる効果がある。 In addition, since an output signal can be generated by processing an input signal that contains a lot of background noise information, a stable quality improvement effect that does not depend much on the noise type or spectrum shape while maintaining the characteristics of the actual background noise can be obtained. It is possible to obtain an effect of improving a component degraded by excitation coding or the like.

また、現在までの入力信号を用いて処理を行うことができるので特に大きな遅延時間は不要で、入力信号と加工信号の加算方法によっては処理時間以外の遅延を排除することもできる効果がある。加工信号のレベルをあげる際には入力信号のレベルを下げていくようにすれば、従来のように劣化成分をマスクするために大きな疑似雑音を重畳することも不要で、逆に適用対象に応じて、背景雑音レベルを小さ目にしたり、大き目にしたりすることすら可能である。また、当然のことであるが、音声符号化復号化による劣化音を解消する場合でも、従来のような新たな伝送情報の追加は不要である。 In addition, since processing can be performed using the input signal up to the present time, a particularly large delay time is not required, and there is an effect that a delay other than the processing time can be eliminated depending on a method of adding the input signal and the processed signal. If the level of the input signal is lowered when raising the level of the processing signal, it is not necessary to superimpose large pseudo noise to mask the degraded components as in the past, and conversely, depending on the application target Thus, it is possible to make the background noise level small or even large. Also, needless to say, even in the case of eliminating the degraded sound due to the voice encoding / decoding, it is not necessary to add new transmission information as in the related art.

本発明の音信号加工方法および音信号加工装置は、入力信号に対して、スペクトル領域での所定の加工処理を行うことで、入力信号に含まれる劣化成分を主観的に気にならないようにした加工信号を生成し、所定の評価値によって入力信号と加工信号の加算重みを制御するようにしたので、上記信号加工方法が持つ効果に加えて、スペクトル領域での細かい劣化成分の抑圧処理を行うことができ、更に主観品質を改善できる効果がある。 The sound signal processing method and the sound signal processing apparatus of the present invention perform predetermined processing in the spectral domain on the input signal, so that the deterioration component included in the input signal is not subjectively noticed. Since the processing signal is generated and the addition weight of the input signal and the processing signal is controlled by a predetermined evaluation value, in addition to the effect of the signal processing method, a process of suppressing a fine degradation component in a spectrum region is performed. Has the effect of further improving the subjective quality.

本発明の音信号加工方法は、上記発明の音信号加工方法において、入力信号と加工信号をスペクトル領域で重み付け加算するようにしたので、上記音信号加工方法が持つ効果に加えて、スペクトル領域での処理を行う雑音抑圧方法の後段に接続する場合などに、音信号加工方法が必要とするフーリエ変換処理、逆フーリエ変換処理を一部または全部省略することができ、処理が簡易化できる効果がある。 According to the sound signal processing method of the present invention, in the sound signal processing method of the present invention, the input signal and the processed signal are weighted and added in the spectral domain. For example, in the case of connecting to the subsequent stage of the noise suppression method that performs the processing of the above, some or all of the Fourier transform processing and the inverse Fourier transform processing required by the sound signal processing method can be omitted, and the effect of simplifying the processing can be obtained. is there.

本発明の音信号加工方法は、上記発明の音信号加工方法において、重み付け加算を周波数成分毎に独立に制御するようにしたので、上記音信号加工方法が持つ効果に加えて、量子化雑音や劣化成分の支配的な成分が重点的に加工信号に置換され、量子化雑音や劣化成分が少ない良好な成分まで置換してしまうことがなくなり、入力信号の特性を良好に残しつつ量子化雑音や劣化成分を主観的に抑圧でき、主観品質を改善できる効果がある。 According to the sound signal processing method of the present invention, in the sound signal processing method of the present invention, the weighted addition is controlled independently for each frequency component. The dominant component of the degraded component is replaced with the processed signal with emphasis, and it is no longer replaced with a good component with a small amount of quantization noise and degraded components. There is an effect that the deteriorating component can be suppressed subjectively and the subjective quality can be improved.

本発明の音信号加工方法は、上記発明の音信号加工方法における加工処理として、振幅スペクトル成分の平滑化処理を行うようにしたので、上記音信号加工方法が持つ効果に加えて、量子化雑音などによって生じる振幅スペクトル成分の不安定な変動を良好に抑圧することができ、主観品質を改善できる効果がある。 The sound signal processing method of the present invention performs the smoothing processing of the amplitude spectrum component as the processing in the sound signal processing method of the present invention. As a result, it is possible to favorably suppress unstable fluctuations of the amplitude spectrum component caused by the above-mentioned factors, and to improve the subjective quality.

本発明の音信号加工方法は、上記発明の音信号加工方法における加工処理として、位相スペクトル成分の擾乱付与処理を行うようにしたので、上記音信号加工方法が持つ効果に加えて、位相成分間に独特な相互関係を持ってしまい、特徴的な劣化と感じられることが多い量子化雑音や劣化成分に対して、位相成分間の関係に擾乱を与えることができ、主観品質を改善できる効果がある。 According to the sound signal processing method of the present invention, as the processing in the sound signal processing method of the present invention, the disturbance imparting process of the phase spectrum component is performed. The quantization noise and the degradation components, which often have characteristic correlation with each other, can disturb the relationship between the phase components and improve the subjective quality. is there.

本発明の音信号加工方法は、上記発明の音信号加工方法における平滑化強度または擾乱付与強度を、入力信号または聴覚重み付けした入力信号の振幅スペクトル成分の大きさによって制御するようにしたので、上記音信号加工方法が持つ効果に加えて、前記振幅スペクトル成分が小さいために量子化雑音や劣化成分が支配的になっている成分に対して重点的に加工が加えられ、量子化雑音や劣化成分が少ない良好な成分まで加工してしまうことがなくなり、入力信号の特性を良好に残しつつ量子化雑音や劣化成分を主観的に抑圧でき、主観品質を改善できる効果がある。 In the sound signal processing method of the present invention, the smoothing strength or the disturbance imparting strength in the sound signal processing method of the present invention is controlled by the magnitude of the amplitude spectrum component of the input signal or the input signal that is auditory weighted. In addition to the effects of the sound signal processing method, processing is added with emphasis on components in which the quantization noise and degraded components are dominant because the amplitude spectrum components are small, and quantization noise and degraded components are added. This eliminates the possibility of processing even a good component with a small amount of noise, has the effect of subjectively suppressing quantization noise and degraded components while maintaining the characteristics of the input signal, and improving the subjective quality.

本発明の音信号加工方法は、上記発明の音信号加工方法における平滑化強度または擾乱付与強度を、入力信号または聴覚重み付けした入力信号のスペクトル成分の時間方向の連続性の大きさによって制御するようにしたので、上記音信号加工方法が持つ効果に加えて、スペクトル成分の連続性が低いために量子化雑音や劣化成分が多くなりがちな成分に対して重点的に加工が加えられ、量子化雑音や劣化成分が少ない良好な成分まで加工してしまうことがなくなり、入力信号の特性を良好に残しつつ量子化雑音や劣化成分を主観的に抑圧でき、主観品質を改善できる効果がある。 According to the sound signal processing method of the present invention, the smoothing strength or the disturbance imparting strength in the sound signal processing method of the present invention is controlled by the magnitude of the temporal continuity of the spectral component of the input signal or the auditory weighted input signal. Therefore, in addition to the effects of the above sound signal processing method, processing is added with emphasis on components that tend to increase quantization noise and degraded components due to the low continuity of spectral components. There is no need to process even a good component with little noise and degraded components, and it is possible to subjectively suppress quantization noise and degraded components while maintaining good characteristics of the input signal, thereby improving the subjective quality.

本発明の音信号加工方法は、上記発明の音信号加工方法における平滑化強度または擾乱付与強度を、前記評価値の時間変動性の大きさによって制御するようにしたので、上記音信号加工方法が持つ効果に加えて、入力信号の特性が変動している区間において必要以上に強い加工処理を抑止でき、特に振幅平滑化によるなまけ、エコーの発生を防止できる効果がある。 According to the sound signal processing method of the present invention, the smoothing strength or the disturbance imparting strength in the sound signal processing method of the present invention is controlled by the magnitude of the time variability of the evaluation value. In addition to the effects, it is possible to suppress unnecessarily strong processing in a section where the characteristics of the input signal are fluctuating, and it is possible to prevent the occurrence of echo and the occurrence of echoes, particularly by amplitude smoothing.

本発明の音信号加工方法は、上記発明の音信号加工方法における所定の評価値として背景雑音らしさの度合を用いるようにしたので、上記音信号加工方法が持つ効果に加えて、量子化雑音や劣化成分が多く発生しがちな背景雑音区間に対して重点的な加工が加えられ、背景雑音以外の区間についてもその区間に適切な加工（加工しない、低レベルの加工を行うなど）が選択されるので、主観品質を改善できる効果がある。 The sound signal processing method of the present invention uses the degree of background noise likeness as the predetermined evaluation value in the sound signal processing method of the present invention. Prioritized processing is applied to background noise sections where deterioration components tend to occur frequently, and appropriate processing (no processing, low-level processing, etc.) is selected for sections other than background noise. Therefore, there is an effect that the subjective quality can be improved.

本発明の音信号加工方法は、上記発明の音信号加工方法における前記所定の評価値として摩擦音らしさの度合を用いるようにしたので、上記音信号加工方法が持つ効果に加えて、量子化雑音や劣化成分が多く発生しがちな摩擦音区間に対して重点的な加工が加えられ、摩擦音以外の区間についてもその区間に適切な加工（加工しない、低レベルの加工を行うなど）が選択されるので、主観品質を改善できる効果がある。 The sound signal processing method of the present invention uses the degree of fricative likeness as the predetermined evaluation value in the sound signal processing method of the present invention.In addition to the effects of the sound signal processing method, quantization noise and noise Focused processing is applied to the friction noise section where a lot of deterioration components tend to occur, and appropriate processing (no processing, low-level processing, etc.) is selected for the section other than the friction sound as well. This has the effect of improving the subjective quality.

本発明の音信号加工方法は、音声符号化処理によって生成された音声符号を入力とし、この音声符号を復号して復号音声を生成し、この復号音声を入力として上記音信号加工方法を用いた信号加工処理を施して加工音声を生成し、この加工音声を出力音声として出力するようにしたので、上記音信号加工方法が持つ主観品質改善効果等をそのまま持った音声復号が実現される効果がある。 The sound signal processing method of the present invention uses a sound code generated by a sound coding process as an input, decodes the sound code to generate a decoded sound, and uses the decoded sound as an input to use the sound signal processing method. Since the processed voice is generated by performing the signal processing and the processed voice is output as the output voice, the effect that the voice decoding having the subjective quality improvement effect and the like of the above-described sound signal processing method can be realized is realized. is there.

本発明の音信号加工方法は、音声符号化処理によって生成された音声符号を入力とし、この音声符号を復号して復号音声を生成し、復号音声に所定の信号加工処理を行って加工音声を生成し、復号音声にポストフィルタ処理を行い、更にポストフィルタ前または後の復号音声を分析して所定の評価値を算出し、この評価値に基づいてポストフィルタ後の復号音声と加工音声を重み付け加算して出力するようにしたので、上記音信号加工方法が持つ主観品質改善効果等をそのまま持った音声復号が実現される効果に加えて、ポストフィルタに影響されない加工音声が生成でき、ポストフィルタに影響されずに算出した精度の高い評価値に基づいて精度の高い加算重み制御ができるようになるので、更に主観品質が改善する効果がある。 The sound signal processing method of the present invention receives a speech code generated by a speech encoding process as an input, decodes the speech code to generate a decoded speech, performs a predetermined signal processing process on the decoded speech, and outputs the processed speech. Generate and perform post-filter processing on the decoded voice, further analyze the decoded voice before or after the post-filter, calculate a predetermined evaluation value, and weight the decoded voice after the post-filter and the processed voice based on this evaluation value. Since the addition and the output are performed, in addition to the effect of realizing the audio decoding having the subjective quality improvement effect and the like of the sound signal processing method as described above, it is possible to generate the processed audio which is not affected by the post filter. Since it is possible to perform highly accurate addition weight control based on the highly accurate evaluation value calculated without being affected by the above, there is an effect that the subjective quality is further improved.

以下図面を参照しながら、この発明の実施の形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

実施の形態１．
図１は、本実施の形態による音信号加工方法を適用した音声復号方法の全体構成を示し、図中１は音声復号装置、２はこの発明による信号加工方法を実行する信号加工部、３は音声符号、４は音声復号部、５は復号音声、６は出力音声である。信号加工部２は、信号変形部７、信号評価部１２、重み付き加算部１８より構成されている。信号変形部７は、フーリエ変換部８、振幅平滑化部９、位相擾乱部１０、逆フーリエ部１１より構成されている。信号評価部１２は、逆フィルタ部１３、パワー算出部１４、背景雑音らしさ算出部１５、推定背景雑音パワー更新部１６、推定雑音スペクトル更新部１７より構成されている。 Embodiment 1 FIG.
FIG. 1 shows an overall configuration of a speech decoding method to which a sound signal processing method according to the present embodiment is applied. In FIG. 1, 1 is a speech decoding device, 2 is a signal processing unit that executes the signal processing method according to the present invention, and 3 is a signal processing unit. The voice code, 4 is a voice decoding unit, 5 is a decoded voice, and 6 is an output voice. The signal processing unit 2 includes a signal transformation unit 7, a signal evaluation unit 12, and a weighted addition unit 18. The signal transformation unit 7 includes a Fourier transform unit 8, an amplitude smoothing unit 9, a phase disturbance unit 10, and an inverse Fourier unit 11. The signal evaluation unit 12 includes an inverse filter unit 13, a power calculation unit 14, a background noise likeness calculation unit 15, an estimated background noise power update unit 16, and an estimated noise spectrum update unit 17.

以下、図に基づいて動作を説明する。 The operation will be described below with reference to the drawings.

まず音声符号３が音声復号装置１内の音声復号部４に入力される。なお、この音声符号３は、別途音声符号化部が音声信号を符号化した結果として出力され、通信路や記憶デバイスを介してこの音声復号部４に入力される。 First, the speech code 3 is input to the speech decoding unit 4 in the speech decoding device 1. The voice code 3 is output as a result of separately coding a voice signal by a voice coding unit, and is input to the voice decoding unit 4 via a communication path or a storage device.

音声復号部４は、音声符号３に対して、前記音声符号化部と対を成す復号処理を行い、得られた所定の長さ（１フレーム長）の信号を復号音声５として出力する。そして、この復号音声５は、信号加工部２内の信号変形部７、信号評価部１２、重み付き加算部１８に入力される。 The audio decoding unit 4 performs a decoding process on the audio code 3 in a pair with the audio encoding unit, and outputs a signal of a predetermined length (one frame length) obtained as a decoded audio 5. Then, the decoded speech 5 is input to the signal transformation unit 7, the signal evaluation unit 12, and the weighted addition unit 18 in the signal processing unit 2.

信号変形部７内のフーリエ変換部８は、入力された現フレームの復号音声５と必要に応じ前フレームの復号音声５の最新部分を合わせた信号に対して、窓がけを行い、窓がけ後の信号に対してフーリエ変換処理を行うことで周波数毎のスペクトル成分を算出し、これを振幅平滑化部９に出力する。なお、フーリエ変換処理としては、離散フーリエ変換（ＤＦＴ）、高速フーリエ変換（ＦＦＴ）などが代表的である。窓がけ処理としては、台形窓、方形窓、ハニング窓など様々なものが適用可能であるが、ここでは、台形窓の両端の傾斜部分をそれぞれハニング窓の半分ずつに置換した変形台形窓を使用する。実際の形状例、復号音声５や出力音声６との時間関係については、図面を用いて後述説明する。 The Fourier transform unit 8 in the signal transformation unit 7 performs windowing on the signal obtained by combining the input decoded speech 5 of the current frame and the latest portion of the decoded speech 5 of the previous frame as necessary, and performs post-windowing. By performing a Fourier transform process on the signal of (i), a spectral component for each frequency is calculated, and this is output to the amplitude smoothing unit 9. Note that typical examples of the Fourier transform processing include a discrete Fourier transform (DFT) and a fast Fourier transform (FFT). As the windowing process, various types such as trapezoidal windows, rectangular windows, Hanning windows can be applied, but here, a modified trapezoidal window in which the inclined parts at both ends of the trapezoidal window are replaced with half of the Hanning window respectively I do. An example of the actual shape and the time relationship with the decoded speech 5 and the output speech 6 will be described later with reference to the drawings.

振幅平滑化部９は、フーリエ変換部８から入力された周波数毎のスペクトルの振幅成分に対して平滑化処理を行い、平滑化後のスペクトルを位相擾乱部１０に出力する。ここで用いる平滑化処理としては、周波数軸方向、時間軸方向の何れを用いても、量子化雑音などの劣化音の抑制効果が得られる。しかし、周波数軸方向の平滑化をあまり強くすると、スペクトルの怠けが生じ、本来の背景雑音の特性を損なってしまうことが多い。一方、時間軸方向の平滑化についても、あまり強くしていくと、長時間にわたって同じ音が残ることになり、反響感が発生してしまう。色々な背景雑音に対して調整を進めた結果、周波数軸方向の平滑化はなし、時間軸方向は振幅を対数領域で平滑化する、とした場合が出力音声６の品質が良かった。その時の平滑化方法は、次式で表わされる。 The amplitude smoothing unit 9 performs a smoothing process on the amplitude component of the spectrum for each frequency input from the Fourier transform unit 8 and outputs the smoothed spectrum to the phase disturbance unit 10. Regarding the smoothing process used here, the effect of suppressing degraded sound such as quantization noise can be obtained regardless of whether it is used in the frequency axis direction or the time axis direction. However, if the smoothing in the frequency axis direction is too strong, the spectrum will be sluggish and the characteristic of the original background noise will often be lost. On the other hand, if the smoothing in the time axis direction is made too strong, the same sound will remain for a long time, and a feeling of reverberation will be generated. As a result of making adjustments for various background noises, the quality of the output speech 6 was good when smoothing was not performed in the frequency axis direction and amplitude was smoothed in the logarithmic domain in the time axis direction. The smoothing method at that time is represented by the following equation.

ｙ_ｉ＝ｙ_ｉ−１（１−α）＋ｘ_ｉα ・・・式１
ここで、ｘ_ｉが現在のフレーム（第ｉフレーム）の平滑化前の対数振幅スペクトル値、ｙ_ｉ−１が前フレーム（第ｉ−１フレーム）の平滑化後の対数振幅スペクトル値、ｙ_ｉが現在のフレーム（第ｉフレーム）の平滑化後の対数振幅スペクトル値、αが０〜１の値を持つ平滑化係数である、平滑化係数αはフレーム長、解消したい劣化音のレベルなどによって最適値が異なるが、概ね０．５程度の値となる。 y _i = y _i−1 (1−α) + x _i α Equation 1
Here, the logarithmic amplitude spectrum value before smoothing x _i is the current frame (i-th frame), y _i-1 is the previous frame logarithmic amplitude spectrum value after smoothing (the i-1 frame), y _i Is the logarithmic amplitude spectrum value of the current frame (i-th frame) after smoothing, α is a smoothing coefficient having a value of 0 to 1, and the smoothing coefficient α depends on the frame length, the level of the degraded sound to be eliminated, and the like. Although the optimum value is different, the value is generally about 0.5.

位相擾乱部１０は、振幅平滑化部９から入力された平滑化後のスペクトルの位相成分に擾乱を与え、擾乱後のスペクトルを逆フーリエ変換部１１に出力する。各位相成分に擾乱を与える方法としては、乱数を用いて所定範囲の位相角を生成し、これを元々の位相角に加算すれば良い。位相角生成の範囲の制限を設けない場合には、各位相成分を乱数で生成した位相角に単に置換すればよい。符号化などによる劣化が大きい場合には、位相角生成の範囲は制限しない。 The phase disturbance unit 10 disturbs the phase component of the smoothed spectrum input from the amplitude smoothing unit 9 and outputs the disturbed spectrum to the inverse Fourier transform unit 11. As a method of giving a disturbance to each phase component, a phase angle in a predetermined range is generated using random numbers, and this may be added to the original phase angle. If there is no restriction on the range of the phase angle generation, it is sufficient to simply replace each phase component with a phase angle generated by a random number. If the deterioration due to encoding or the like is large, the range of phase angle generation is not limited.

逆フーリエ変換部１１は、位相擾乱部１０から入力された擾乱後のスペクトルに対して逆フーリエ変換処理を行うことで、信号領域に戻し、前後のフレームとの滑らかな連接のための窓がけを行いつつ連接していき、得られた信号を変形復号音声３４として重み付き加算部１８に出力する。 The inverse Fourier transform unit 11 performs an inverse Fourier transform process on the disturbed spectrum input from the phase disturbing unit 10 to return the signal to the signal area and open a window for smooth connection with the preceding and succeeding frames. While performing the connection, the obtained signal is output to the weighted addition unit 18 as the modified decoded speech 34.

信号評価部１２内の逆フィルタ部１３は、後述する推定雑音スペクトル更新部１７内に格納されている推定雑音スペクトルパラメータを用いて、前記音声復号部４から入力された復号音声５に対する逆フィルタ処理を行い、逆フィルタされた復号音声をパワー算出部１４に出力する。この逆フィルタ処理によって、背景雑音の振幅が大きい、つまり音声と背景雑音が拮抗している可能性が高い成分の振幅抑圧を行っており、逆フィルタ処理を行わない場合に比べて、音声区間と背景雑音区間の信号パワー比が大きくとれるようになっている。 The inverse filter unit 13 in the signal evaluation unit 12 performs an inverse filter process on the decoded speech 5 input from the speech decoding unit 4 using an estimated noise spectrum parameter stored in an estimated noise spectrum update unit 17 described later. And outputs the inverse-filtered decoded speech to the power calculator 14. By this inverse filter processing, the amplitude of the background noise is large, that is, the amplitude of a component that is highly likely to be antagonized between the voice and the background noise is suppressed. The signal power ratio in the background noise section can be made large.

なお、推定雑音スペクトルパラメータは、音声符号化処理や音声復号処理との親和性、ソフトウエアの共有化といった観点で選択する。現状では多くの場合、線スペクトル対（ＬＳＰ）を使用する。ＬＳＰの他にも、線形予測係数（ＬＰＣ）、ケプストラムなどのスペクトル包絡パラメータ、または振幅スペクトルそのものを用いても類似の効果を得ることができる。後述する推定雑音スペクトル更新部１７における更新処理としては線形補間や平均処理などを用いる構成が簡単であり、スペクトル包絡パラメータの中では線形補間や平均処理を行ってもフィルタが安定であることが保証できるＬＳＰとケプストラムが適している。雑音成分のスペクトルに対する表現力としてはケプストラムが優れているが、逆フィルタ部の構成の容易さという点ではＬＳＰが勝る。振幅スペクトルを用いる場合には、この振幅スペクトル特性をもつＬＰＣを算出して逆フィルタに使用するか、復号音声５をフーリエ変換した結果（フーリエ変換部８の出力に等しい）に対して振幅変形処理を行って逆フィルタと同様の効果を実現すればよい。 Note that the estimated noise spectrum parameter is selected from the viewpoints of compatibility with speech encoding processing and speech decoding processing and sharing of software. At present, a line spectrum pair (LSP) is often used. Similar effects can be obtained by using a spectral envelope parameter such as a linear prediction coefficient (LPC), a cepstrum, or the amplitude spectrum itself, in addition to the LSP. It is easy to use a linear interpolation or averaging process as the updating process in the estimated noise spectrum updating unit 17 described later, and it is guaranteed that the filter is stable even if the linear interpolation or the averaging process is performed among the spectral envelope parameters. Suitable LSPs and cepstrum are suitable. The cepstrum is superior in expressing the noise component to the spectrum, but the LSP is superior in terms of the easiness of the configuration of the inverse filter unit. When an amplitude spectrum is used, an LPC having this amplitude spectrum characteristic is calculated and used for an inverse filter, or the result of Fourier transform of the decoded speech 5 (equal to the output of the Fourier transform unit 8) is subjected to an amplitude transformation process. To achieve the same effect as the inverse filter.

パワー算出部１４は、逆フィルタ部１３から入力された逆フィルタされた復号音声のパワーを求め、算出されたパワー値を背景雑音らしさ算出部１５に出力する。 The power calculator 14 obtains the power of the inverse-filtered decoded speech input from the inverse filter 13 and outputs the calculated power value to the background noise likelihood calculator 15.

背景雑音らしさ算出部１５は、パワー算出部１４から入力されたパワーと、後述する推定雑音パワー更新部１６内に格納されている推定雑音パワーを用いて、現在の復号音声５の背景雑音らしさを算出し、これを加算制御値３５として重み付き加算部１８に出力する。また、算出した背景雑音らしさを後述する推定雑音パワー更新部１６と推定雑音スペクトル更新部１７に対して出力し、パワー算出部１４から入力されたパワーを後述する推定雑音パワー更新部１６に対して出力する。ここで、背景雑音らしさについては、最も単純には、次式によって算出できる。 The background noise likelihood calculating unit 15 uses the power input from the power calculating unit 14 and the estimated noise power stored in the estimated noise power updating unit 16 described later to calculate the background noise likelihood of the current decoded speech 5. The calculated value is output to the weighted addition unit 18 as the addition control value 35. Further, the calculated background noise likelihood is output to an estimated noise power updating unit 16 and an estimated noise spectrum updating unit 17 described later, and the power input from the power calculating unit 14 is output to the estimated noise power updating unit 16 described later. Output. Here, the likelihood of the background noise can be calculated most simply by the following equation.

ｖ＝ｌｏｇ（ｐ_Ｎ） − ｌｏｇ（ｐ）・・・式２
ここで、ｐがパワー算出部１４から入力されたパワー、ｐ_Ｎが推定雑音パワー更新部１６内に格納されている推定雑音パワー、ｖが算出された背景雑音らしさである。 _{v = log (p N) -} log (p) ··· formula 2
Here, p is the power input from the power calculator 14, p _N is the estimated noise power stored in the estimated noise power updater 16, and v is the calculated background noise likelihood.

この場合、ｖの値が大きい程（負値であればその絶対値が小さい程）背景雑音らしい、ということになる。この他にも、ｐ_Ｎ／ｐを計算してｖとするなど、様々な算出方法が考えられる。 In this case, the larger the value of v (the smaller the absolute value of a negative value, the smaller the absolute value), the more likely the background noise. In addition, various calculation methods such as calculating p _N / p to obtain v are conceivable.

推定雑音パワー更新部１６は、背景雑音らしさ算出部１５から入力された背景雑音らしさとパワーを用いて、その内部に格納してある推定雑音パワーの更新を行う。例えば、入力された背景雑音らしさが高い（ｖの値が大きい）時に、次式に従い、入力されたパワーを推定雑音パワーに反映させることで更新を行う。 The estimated noise power updating unit 16 updates the estimated noise power stored therein using the background noise likelihood and the power input from the background noise likeness calculating unit 15. For example, when the likelihood of the input background noise is high (the value of v is large), the update is performed by reflecting the input power on the estimated noise power according to the following equation.

ｌｏｇ（ｐ_Ｎ′）＝（１−β）ｌｏｇ（ｐ_Ｎ）＋βｌｏｇ（ｐ）・・・式３
ここで、βは０〜１の値を取る更新速度定数で、比較的０に近い値に設定するとよい。この式の右辺の値を求めて、左辺のｐ_Ｎ′を新しい推定雑音パワーとすることで更新を行う。 _{log (p N ') = (} 1-β) log (p N) + βlog (p) ··· Formula 3
Here, β is an update rate constant taking a value of 0 to 1 and may be set to a value relatively close to 0. The value on the right side of this equation is obtained, and updating is performed by setting p _N ′ on the left side as a new estimated noise power.

なお、この推定雑音パワーの更新方法については、更に推定精度を向上させるためにフレーム間での変動性を参照したり、入力された過去のパワーを複数格納しておいて、統計分析によって雑音パワーの推定を行ったり、ｐの最低値をそのまま推定雑音パワーとしたりするなど様々な変形、改良が可能である。 In addition, regarding the method of updating the estimated noise power, the variability between frames is referred to in order to further improve the estimation accuracy, a plurality of input past powers are stored, and the noise power is determined by statistical analysis. Various modifications and improvements are possible, for example, by estimating the minimum value of p and using the lowest value of p as the estimated noise power.

推定雑音スペクトル更新部１７は、まず入力された復号音声５を分析して、現在のフレームのスペクトルパラメータを算出する。算出するスペクトルパラメータについては逆フィルタ部１３にて説明した通りで、多くの場合ＬＳＰを使用する。そして、背景雑音らしさ算出部１５から入力され背景雑音らしさとここで算出したスペクトルパラメータを用いて、内部に格納してある推定雑音スペクトルを更新する。例えば、入力された背景雑音らしさが高い（ｖの値が大きい）時に、次式に従い、算出したスペクトルパラメータを推定雑音スペクトルに反映させることで更新を行う。 The estimated noise spectrum updating unit 17 first analyzes the input decoded speech 5 and calculates a spectrum parameter of the current frame. Spectral parameters to be calculated are as described in the inverse filter unit 13, and LSP is used in many cases. Then, the estimated noise spectrum stored therein is updated using the background noise likelihood input from the background noise likeness calculating unit 15 and the spectrum parameter calculated here. For example, when the likelihood of the input background noise is high (the value of v is large), the update is performed by reflecting the calculated spectral parameters in the estimated noise spectrum according to the following equation.

ｘ_Ｎ′＝（１−γ）ｘ_Ｎ＋γｘ・・・式４
ここで、ｘが現在のフレームのスペクトルパラメータ、ｘ_Ｎが推定雑音スペクトル（パラメータ）である。γは０〜１の値を取る更新速度定数で、比較的０に近い値に設定するとよい。この式の右辺の値を求めて、左辺のｘ_Ｎ′を新しい推定雑音スペクトル（パラメータ）とすることで更新を行う。 _{x N '= (1-γ} ) x N + γx ··· formula 4
Here, x is from the spectrum parameter of the current frame, x _N is the estimated noise spectrum (parameter). γ is an update rate constant taking a value of 0 to 1 and may be set to a value relatively close to 0. The value on the right side of this equation is obtained, and x _N ′ on the left side is updated as a new estimated noise spectrum (parameter).

なお、この推定雑音スペクトルの更新方法についても、上記推定雑音パワーの更新方法と同様に様々な改良が可能である。 In addition, various improvements can be made to the method for updating the estimated noise spectrum in the same manner as the above-described method for updating the estimated noise power.

そして、最後の処理として、重み付き加算部１８は、信号評価部１２から入力された加算制御値３５に基づいて、音声復号部４から入力された復号音声５と信号変形部７から入力された変形復号音声３４を重み付けして加算し、得られた出力音声６を出力する。重み付け加算の制御方法の動作としては、加算制御値３５が大きく（背景雑音らしさが高く）なるにつれて復号音声５に対する重みを小さく、変形復号音声３４に対する重みを大きく制御する。逆に加算制御値３５が小さく（背景雑音らしさが低く）なるにつれて復号音声５に対する重みを大きく、変形復号音声３４に対する重みを小さく制御する。 Then, as the last processing, the weighted addition unit 18 receives the decoded speech 5 input from the speech decoding unit 4 and the input from the signal transformation unit 7 based on the addition control value 35 input from the signal evaluation unit 12. The modified decoded speech 34 is weighted and added, and the obtained output speech 6 is output. As the operation of the control method of weighted addition, as the addition control value 35 becomes larger (the likelihood of the background noise becomes higher), the weight for the decoded speech 5 is made smaller and the weight for the modified decoded speech 34 is made larger. Conversely, as the addition control value 35 becomes smaller (lower the likelihood of background noise), the weight for the decoded speech 5 is controlled to be larger, and the weight for the modified decoded speech 34 is controlled to be smaller.

なお、フレーム間での重みの急変に伴う出力音声６の品質劣化を抑制するために、加算制御値３５または重み付け係数をサンプル毎に徐々に変化するように平滑化を行うことが望ましい。 Note that, in order to suppress the quality deterioration of the output sound 6 due to a sudden change in weight between frames, it is desirable to perform smoothing so that the addition control value 35 or the weighting coefficient gradually changes for each sample.

図２には、この重み付け加算部１８における、加算制御値に基づく重み付け加算の制御例を示す。 FIG. 2 shows a control example of weighted addition based on the addition control value in the weighted addition section 18.

図２（ａ）では、加算制御値３５に対する２つの閾値ｖ_１とｖ_２を用いて線形制御している場合である。加算制御値３５がｖ_１未満の場合には、復号音声５に対する重み付け係数ｗ_Ｓを１、変形復号音声３４に対する重み付け係数ｗ_Ｎを０とする。加算制御値３５がｖ_２以上の場合には、復号音声５に対する重み付け係数ｗ_Ｓを０、変形復号音声３４に対する重み付け係数ｗ_ＮをＡ_Ｎとする。そして加算制御値３５がｖ_１以上でｖ_２未満の場合には、復号音声５に対する重み付け係数ｗ_Ｓを１〜０、変形復号音声３４に対する重み付け係数ｗ_Ｎを０〜Ａ_Ｎの間で線形的に計算して与えている。 In FIG. 2 (a), a case where linearly controlled using two threshold v ₁ and v ₂ for the addition control value 35. Addition control value 35 is the case of less than _{v 1,} the 1 weighting coefficient _{w S} for the decoded speech 5, the weighting coefficient _{w N} to deformation decoded speech 34 to 0. If the addition control value 35 is _{v 2} or more, the weighting coefficient _{w S} for the decoded speech 5 0, the weighting coefficient _{w N} to deformation decoded speech 34 to _{A N.} And when the addition control value 35 is _v less than ₂ _{v 1} or more, linear weighting coefficients _{w S} for the decoded speech 5 1-0, the weighting coefficient _{w N} to deformation decoded speech 34 between 0 to A _N Is calculated and given.

この様に制御することで、確実に背景雑音区間であると判断できる場合（ｖ_２以上）には変形復号信号３４のみが出力され、確実に音声区間であると判断できる場合（ｖ_１未満）には復号音声５そのものが出力され、音声区間か背景雑音区間か判断がつかない場合（ｖ_１以上ｖ_２未満）には、どちらの傾向が強いかに依存した比率で復号音声５と変形復号音声３４が混合された結果が出力される。 If this control is that as a reliably only modified decoded signal 34 when (v ₂ or more) that can be determined that the background noise interval is output, can be reliably determined that the speech segment (v less than ₁₎ Outputs the decoded speech 5 itself, and if it is not possible to determine whether it is a speech section or a background noise section (v ₁ or more and less than v ₂ ), the decoded speech 5 and the modified decoding at a ratio depending on which tendency is stronger. The result obtained by mixing the audio 34 is output.

なお、ここで確実に背景雑音区間であると判断できる場合（ｖ_２以上）に変形復号信号３４に乗じる重み付け係数値Ａ_Ｎとして１以下の値を与えれば、結果的に背景雑音区間の振幅抑圧効果が得られる。逆に１以上の値を与えれば、背景雑音区間の振幅強調効果が得られる。背景雑音区間は、音声符号化復号化処理によって振幅低下が起こる場合が多く、その場合には背景雑音区間の振幅強調を行うことによって、背景雑音の再現性を向上することができる。振幅抑圧と振幅強調のどちらを行うかは適用対象、使用者の要求などに依存する。 Note that if you give a value of 1 or less as the weighting coefficient value A _N for multiplying the modified decoded signal 34 when (v ₂ or more) which can be determined that where a strictly background noise period, resulting in the amplitude suppression of the background noise period The effect is obtained. Conversely, if a value of 1 or more is given, an effect of enhancing the amplitude of the background noise section can be obtained. The amplitude of the background noise section often decreases due to the speech encoding / decoding process. In such a case, the reproducibility of the background noise can be improved by emphasizing the amplitude of the background noise section. Whether to perform amplitude suppression or amplitude emphasis depends on the application target, user requirements, and the like.

図２（ｂ）では、新たな閾値ｖ_３を追加し、ｖ_１とｖ_３間、ｖ_３とｖ_２間で重み付け係数を線形的に計算して与えた場合である。閾値ｖ_３の位置における重み付け係数の値を調整することで、音声区間か背景雑音区間か判断がつかない場合（ｖ_１以上ｖ_２未満）における混合比率を更に細かく設定することができる。一般に位相の相関が低い２つの信号を加算した場合、得られる信号のパワーは加算前の２つの信号のパワーの合計より小さくなる。ｖ_１以上ｖ_２未満の範囲における２つの重み付け係数の合計を１ないしｗ_Ｎより大きくすることで、このパワー低下を抑制することができる。なお、図２（ａ）によって得られた重み付け係数の平方根をとって更に定数を乗じた値を新たに重み付け係数とすることによっても同様の効果をもたらすことができる。 In FIG. 2 (b), is to add the new threshold value _{v 3,} _v between ₁ and _{v 3,} _{v 3} and _v if the weighting factor given by linearly calculated between _2. By adjusting the value of the weighting coefficient at the position of the threshold value v _3, it can be set more finely to the mixing ratio in the case of determining whether the speech segment or the background noise period is not attached (v ₁ or v less than _2). Generally, when two signals having low phase correlations are added, the power of the obtained signal is smaller than the sum of the powers of the two signals before the addition. v ₁ or v, 1 to the sum of the two weighting factors in the range of less than ₂ to be larger than w _N, it is possible to suppress the power reduction. The same effect can be obtained by taking a square root of the weighting coefficient obtained in FIG. 2A and further multiplying by a constant to obtain a new weighting coefficient.

図２（ｃ）では、図２（ａ）のｖ_１未満の範囲における変形復号音声３４に与える重み付け係数ｗ_Ｎとして０より大きいＢ_Ｎという値を与え、これに応じてｖ_１以上ｖ_２未満の範囲におけるｗ_Ｎも修正した場合である。背景雑音レベルが高い場合や、符号化における圧縮率が非常に高い場合など、音声区間における量子化雑音や劣化音が大きい場合には、この様に確実に音声区間と分かっている範囲においても、変形復号音声を加算することで、劣化音を聞こえにくくすることができる。 In FIG. 2 (c), a value of B _N greater than 0 is given as a weighting coefficient w _{N to be applied} to the modified decoded speech 34 in a range less than v _{1 in} FIG. 2 (a), and accordingly, v ₁ or more and less than v ₂ This is a case where w _N in the range is also corrected. If the quantization noise or degraded sound in the voice section is large, such as when the background noise level is high or the compression rate in encoding is very high, even in the range where the voice section is surely known, By adding the modified decoded voice, it is possible to make the degraded sound difficult to hear.

図２（ｄ）は、背景雑音らしさ算出部１５において、推定雑音パワーを現在のパワーで除算した結果（ｐ_Ｎ／ｐ）を背景雑音らしさ（加算制御値３５）として出力した場合に対応する制御例である。この場合、加算制御値３５は復号音声５中に含まれる背景雑音の比率を示しているので、この値に比例した比率で混合されるように重み付け係数を算出している。具体的には、加算制御値３５が１以上の場合にはｗ_Ｎが１でｗ_Ｓが０、１未満の場合には、ｗ_Ｎが加算制御値３５そのもの、ｗ_Ｓが（１−ｗ_Ｎ）となっている。 FIG. 2D shows a control corresponding to a case where the background noise likelihood calculating unit 15 outputs the result (p _N / p) obtained by dividing the estimated noise power by the current power as the background noise likeness (addition control value 35). It is an example. In this case, since the addition control value 35 indicates the ratio of the background noise included in the decoded speech 5, the weighting coefficient is calculated so that the addition is performed at a ratio proportional to this value. Specifically, if _{w S} at _{w N} is 1 when the addition control value 35 is 1 or more is less than 0 and 1, _{w N} is the addition control value 35 itself, _{w S} is _{(1-w N} ).

図３には、フーリエ変換部８における切り出し窓、逆フーリエ変換部１１における連接のための窓の実際の形状例、復号音声５との時間関係を説明する説明図を示す。 FIG. 3 is an explanatory diagram illustrating an example of an actual shape of a cutout window in the Fourier transform unit 8, a window for connection in the inverse Fourier transform unit 11, and a time relationship with the decoded speech 5.

復号音声５は、音声復号部４から所定の時間長（１フレーム長）毎に出力されてくる。ここでこの１フレーム長をＮサンプルとする。図３（ａ）は、この復号音声５の一例を示しており、ｘ（０）〜ｘ（Ｎ−１）が入力された現在のフレームの復号音声５に当たる。フーリエ変換部８では、図３（ａ）に示されるこの復号音声５に対して図３（ｂ）に示す変形台形窓を乗じることで、長さ（Ｎ＋ＮＸ）の信号を切り出す。ＮＸは変形台形窓の両端の１未満の値を持つ区間のそれぞれの長さである。この両端の区間は長さ（２ＮＸ）のハニング窓を前半と後半に２分割したものに等しい。逆フーリエ変換部１１では、逆フーリエ変換処理によって生成した信号に対して、図３（ｃ）に示す変形台形窓を乗じ、（図３（ｃ）に破線で示すように）前後のフレームで得られた同信号と時間関係を守りつつ信号の加算を行って、連続する変形復号音声３４（図３（ｄ））を生成する。 The decoded speech 5 is output from the speech decoding unit 4 at every predetermined time length (one frame length). Here, this one frame length is assumed to be N samples. FIG. 3A shows an example of the decoded speech 5, and x (0) to x (N-1) correspond to the decoded speech 5 of the input current frame. The Fourier transform unit 8 cuts out a signal of length (N + NX) by multiplying the decoded speech 5 shown in FIG. 3A by the modified trapezoidal window shown in FIG. 3B. NX is the length of each of the sections having a value less than 1 at both ends of the modified trapezoidal window. The sections at both ends are equal to a Hanning window having a length (2NX) divided into a first half and a second half. The inverse Fourier transform unit 11 multiplies the signal generated by the inverse Fourier transform process by the modified trapezoidal window shown in FIG. 3C, and obtains the signals in the preceding and succeeding frames (as indicated by broken lines in FIG. 3C). The signal is added while keeping the time relationship with the same signal, and a continuous modified decoded voice 34 (FIG. 3D) is generated.

次のフレームの信号との連接のための区間（長さＮＸ）については、現在のフレーム時点では変形復号音声３４が確定していない。すなわち、新たに確定する変形復号音声３４は、ｘ′（−ＮＸ）〜ｘ′（Ｎ−ＮＸ−１）である。このため、現在のフレームの復号音声５に対して得られる出力音声６は、次式の通りとなる。 In the section (length NX) for connection with the signal of the next frame, the modified decoded speech 34 has not been determined at the time of the current frame. That is, the newly determined modified decoded speech 34 is x '(-NX) to x' (N-NX-1). Therefore, the output speech 6 obtained for the decoded speech 5 of the current frame is as follows.

ｙ（ｎ）＝ｘ（ｎ）＋ｘ′（ｎ）・・・式５
（ｎ＝ −ＮＸ，…，Ｎ−ＮＸ−１）
ここで、ｙ（ｎ）が出力音声６である。この時、信号加工部２としての処理遅延は最低でもＮＸだけ必要となる。 y (n) = x (n) + x ′ (n) Equation 5
(N = -NX, ..., N-NX-1)
Here, y (n) is the output sound 6. At this time, the processing delay as the signal processing unit 2 requires at least NX.

この処理遅延ＮＸが許容できない適用対象の場合、復号音声５と変形復号音声３４の時間的ズレを許容して、次式のように出力音声６を生成することもできる。 If the processing delay NX is an application target that cannot be tolerated, the output speech 6 can be generated as in the following equation by allowing the time lag between the decoded speech 5 and the modified decoded speech 34.

ｙ（ｎ）＝ｘ（ｎ）＋ｘ′（ｎ−ＮＸ）・・・式６
（ｎ＝０，…，Ｎ−１）
この場合、復号音声５と変形復号音声３４の時間関係にズレがあるので、位相擾乱部１０における擾乱が弱い（つまり復号音声の位相特性がある程度残っている）場合や、フレーム内でスペクトルやパワーが急変する場合には劣化を生じる場合がある。特に重み付き加算部１８における重み付け係数が大きく変化するときと、２つの重み付け係数が拮抗している場合に劣化を生じ易い。しかし、それらの劣化は比較的少なく、信号加工部の導入効果の方が十分に大きい。よって処理遅延ＮＸが許容できない適用対象についても、この方法を用いることができる。 y (n) = x (n) + x '(n-NX) Expression 6
(N = 0, ..., N-1)
In this case, there is a time difference between the decoded speech 5 and the modified decoded speech 34, so that the disturbance in the phase disturbance unit 10 is weak (that is, the phase characteristics of the decoded speech remain to some extent), or the spectrum or power in the frame is not sufficient. When there is a sudden change, deterioration may occur. In particular, when the weighting coefficients in the weighted addition unit 18 change greatly, and when the two weighting coefficients are in opposition, deterioration is likely to occur. However, their deterioration is relatively small, and the effect of introducing the signal processing unit is sufficiently large. Therefore, this method can also be used for an application target for which the processing delay NX cannot be tolerated.

なお、この図３の場合、フーリエ変換前と逆フーリエ変換後に変形台形窓を乗じており、連接部分の振幅低下を招く場合がある。この振幅低下も、位相擾乱部１０における擾乱が弱い場合に起こりやすい。そのような場合には、フーリエ変換前の窓を方形窓に変更することで振幅低下の抑制が得られる。通常、位相擾乱部１０によって位相が大きく変形された結果、逆フーリエ変換後の信号に最初の変形台形窓の形状が現れてこないので、前後のフレームの変形復号音声３４とのスムーズな連接のために２つ目の窓がけが必要になる。 In the case of FIG. 3, the modified trapezoidal window is multiplied before the Fourier transform and after the inverse Fourier transform, which may cause a decrease in the amplitude of the connected portion. This decrease in amplitude is likely to occur when the disturbance in the phase disturbance unit 10 is weak. In such a case, by changing the window before the Fourier transform to a square window, it is possible to suppress a decrease in amplitude. Normally, as a result of the phase being greatly deformed by the phase disturbance unit 10, the shape of the first deformed trapezoidal window does not appear in the signal after the inverse Fourier transform, so that a smooth connection with the deformed decoded speech 34 of the preceding and succeeding frames is obtained. Needs a second window.

なお、ここでは、信号変形部７、信号評価部１２、重み付け加算部１８の処理を全てフレーム毎に行ったが、これに限ったものではない。例えば、１フレームを複数のサブフレームに分割し、信号評価部１２の処理をサブフレーム毎に行ってサブフレーム毎の加算制御値３５を算出し、重み付け加算部１８における重み付け制御もサブフレーム毎に行っても良い。信号変形処理にフーリエ変換を使用しているので、フレーム長があまり短いとスペクトル特性の分析結果が不安定になり、変形復号音声３４が安定しにくい。一方、背景雑音らしさはもっと短い区間に対しても比較的安定に算出できるので、サブフレーム毎に算出して重み付けを細かく制御することで音声の立ち上がり部分などにおける品質改善効果が得られる。 Here, the processing of the signal transformation unit 7, the signal evaluation unit 12, and the weighting addition unit 18 are all performed for each frame, but the present invention is not limited to this. For example, one frame is divided into a plurality of subframes, the processing of the signal evaluation unit 12 is performed for each subframe, an addition control value 35 for each subframe is calculated, and the weighting control in the weighting addition unit 18 is also performed for each subframe. You may go. Since the Fourier transform is used for the signal deformation processing, if the frame length is too short, the analysis result of the spectral characteristics becomes unstable, and the deformed decoded speech 34 is hardly stable. On the other hand, since the likelihood of the background noise can be calculated relatively stably even for a shorter section, a quality improvement effect in a rising part of voice or the like can be obtained by calculating for each subframe and finely controlling the weight.

また、信号評価部１２の処理をサブフレーム毎に行って、フレーム内の全ての加算制御値を組み合わせて、少数の加算制御値３５を算出することもできる。音声区間を背景雑音らしいと誤りたくない場合には、全ての加算制御値の内の最小値（背景雑音らしさの最小値）を選択してフレームを代表する加算制御値３５として出力すれば良い。 Further, the processing of the signal evaluation unit 12 may be performed for each sub-frame, and a small number of addition control values 35 may be calculated by combining all the addition control values in the frame. If it is not desired to make an error in the voice section as background noise, the minimum value (minimum value of the background noise likeness) of all the addition control values may be selected and output as the addition control value 35 representing the frame.

更に、復号音声５のフレーム長と信号変形部７の処理フレーム長は同一である必要はない。例えば、復号音声５のフレーム長が短くて、信号変形部７内のスペクトル分析にとって短すぎる場合には、複数フレームの復号音声５を蓄積して、一括して信号変形処理を行うようにすれば良い。但し、この場合には、複数フレームの復号音声５を蓄積するために処理遅延が発生してしまう。この他、復号音声５のフレーム長と全く独立に信号変形部７や信号加工部２全体の処理フレーム長を設定しても構わない。この場合、信号のバッファリングが複雑になるが、様々な復号音声５のフレーム長に依存することなく、信号加工処理にとって最適の処理フレーム長を選択でき、信号加工部２の品質が最も良くなる効果がある。 Further, the frame length of the decoded voice 5 and the processing frame length of the signal transformation unit 7 do not need to be the same. For example, if the frame length of the decoded speech 5 is too short and too short for the spectrum analysis in the signal transformation unit 7, the decoded speech 5 of a plurality of frames may be accumulated and the signal transformation processing may be performed collectively. good. However, in this case, a processing delay occurs because the decoded speech 5 of a plurality of frames is accumulated. In addition, the processing frame length of the entire signal deforming unit 7 and the signal processing unit 2 may be set completely independently of the frame length of the decoded speech 5. In this case, the buffering of the signal becomes complicated, but the optimum processing frame length for the signal processing can be selected without depending on the frame length of various decoded voices 5, and the quality of the signal processing unit 2 becomes the best. effective.

また、ここでは、背景雑音らしさの算出に、逆フィルタ部１３、パワー算出部１４、背景雑音らしさ算出部１５、推定背景雑音レベル更新部１６、推定雑音スペクトル更新部１７を使用したが、背景雑音らしさを評価するものであれば、この構成に限ったものではない。 Also, here, the inverse filter unit 13, the power calculation unit 14, the background noise likeness calculation unit 15, the estimated background noise level update unit 16, and the estimated noise spectrum update unit 17 are used to calculate the background noise likelihood. The configuration is not limited to this configuration as long as it evaluates the likelihood.

この実施の形態１によれば、入力信号（復号音声）に対して所定の信号加工処理を行うことで、入力信号に含まれる劣化成分を主観的に気にならないようにした加工信号（変形復号音声）を生成し、所定の評価値（背景雑音らしさ）によって入力信号と加工信号の加算重みを制御するようにしたので、劣化成分が多く含まれる区間を中心に加工信号の比率を増やして、主観品質を改善できる効果がある。 According to the first embodiment, by performing a predetermined signal processing process on an input signal (decoded voice), a processed signal (deformed decoding) in which a degradation component included in the input signal is not subjectively noticed Voice), and the addition weight of the input signal and the processed signal is controlled by a predetermined evaluation value (likelihood of background noise). Therefore, the ratio of the processed signal is increased centering on a section containing many degraded components. This has the effect of improving subjective quality.

また、スペクトル領域で信号加工処理を行うようにしたことで、スペクトル領域での細かい劣化成分の抑圧処理を行うことができ、更に主観品質を改善できる効果がある。 In addition, since the signal processing is performed in the spectral domain, it is possible to perform a process of suppressing a fine degradation component in the spectral domain, and there is an effect that the subjective quality can be further improved.

また、加工処理として振幅スペクトル成分の平滑化処理と位相スペクトル成分の擾乱付与処理を行うようにしたので、量子化雑音などによって生じる振幅スペクトル成分の不安定な変動を良好に抑圧することができ、更に、位相成分間に独特な相互関係を持ってしまい特徴的な劣化と感じられることが多い量子化雑音に対して、位相成分間の関係に擾乱を与えることができ、主観品質を改善できる効果がある。 In addition, since the smoothing process of the amplitude spectrum component and the disturbance imparting process of the phase spectrum component are performed as the processing, the unstable fluctuation of the amplitude spectrum component caused by quantization noise or the like can be suppressed well. Furthermore, for quantization noise, which has a unique mutual relationship between phase components and is often perceived as characteristic degradation, it is possible to disturb the relationship between phase components, thereby improving the subjective quality. There is.

また、従来の音声区間または背景雑音区間のどちらか、という２値区間判定を廃し、背景雑音らしさという連続量を算出して、これに基づいて連続的に復号音声と変形復号音声の重み付け加算係数を制御するようにしたので、区間判定誤りによる品質劣化を回避できる効果がある。 Further, the conventional binary section determination of either the speech section or the background noise section is abolished, a continuous amount called background noise likeness is calculated, and the weighted addition coefficient of the decoded speech and the modified decoded speech is continuously calculated based on this. Is controlled, so that there is an effect that quality deterioration due to a section determination error can be avoided.

また、音声区間における量子化雑音や劣化音が大きい場合には、確実に音声区間と分かっている区間においても、変形復号音声を加算することで、劣化音を聞こえにくくすることができる効果がある。 Further, when the quantization noise or the degraded sound in the voice section is large, the degraded sound can be made hard to be heard by adding the modified decoded voice even in the section that is surely known as the voice section. .

また、背景雑音の情報が多く含まれている復号音声の加工処理によって出力音声を生成しているので、実際の背景雑音の特性を残しつつ、雑音種やスペクトル形状にあまり依存しない安定な品質改善効果が得られるし、音源符号化などによる劣化成分に対しても改善効果が得られる効果がある。 In addition, since the output speech is generated by processing the decoded speech that contains a lot of background noise information, stable quality improvement that does not depend much on the noise type or spectrum shape while maintaining the characteristics of the actual background noise An effect is obtained, and an improvement effect is obtained even for a degradation component due to excitation coding or the like.

また、現在までの復号音声を用いて処理を行うので特に大きな遅延時間は不要で、復号音声と変形復号音声の加算方法によっては処理時間以外の遅延を排除することもできる効果がある。変形復号音声のレベルを上げる際には復号音声のレベルを下げていくので、従来のように量子化雑音を聞こえなくするために大きな疑似雑音を重畳することも不要で、逆に適用対象に応じて、背景雑音レベルを小さ目にしたり、大き目にしたりすることすら可能である。また、当然のことであるが、音声復号装置または信号加工部内に閉じた処理であるので従来のような新たな伝送情報の追加は不要である。 Further, since the processing is performed using the decoded voice up to the present time, a particularly large delay time is not required, and there is an effect that a delay other than the processing time can be eliminated depending on the method of adding the decoded voice and the modified decoded voice. When raising the level of the modified decoded voice, the level of the decoded voice is reduced, so it is not necessary to superimpose large pseudo noise to make the quantization noise inaudible as in the past. Thus, it is possible to make the background noise level small or even large. Needless to say, since the processing is closed in the audio decoding device or the signal processing unit, it is not necessary to add new transmission information as in the related art.

更に、この実施の形態１では、音声復号部と信号加工部が明確に分離されており、両者の間の情報のやりとりも少ないので、既存のものも含めて様々な音声復号装置内に導入することが容易である。 Further, in the first embodiment, the audio decoding unit and the signal processing unit are clearly separated, and information exchange between the two is small. Therefore, the audio decoding unit is introduced into various audio decoding devices including the existing one. It is easy.

実施の形態２．
図４は、本実施の形態による音信号加工方法を雑音抑圧方法と組み合わて適用した音信号加工装置の構成の一部を示す。図中３６は入力信号、８はフーリエ変換部、１９は雑音抑圧部、３９はスペクトル変形部、１２は信号評価部、１８は重み付き加算部、１１は逆フーリエ変換部、４０は出力信号である。スペクトル変形部３９は、振幅平滑化部９、位相擾乱部１０より構成されている。
以下、図に基づいて動作を説明する。 Embodiment 2 FIG.
FIG. 4 shows a part of the configuration of a sound signal processing apparatus to which the sound signal processing method according to the present embodiment is applied in combination with the noise suppression method. In the figure, 36 is an input signal, 8 is a Fourier transform unit, 19 is a noise suppression unit, 39 is a spectrum transforming unit, 12 is a signal evaluation unit, 18 is a weighted addition unit, 11 is an inverse Fourier transform unit, and 40 is an output signal. is there. The spectrum deforming section 39 includes an amplitude smoothing section 9 and a phase disturbance section 10.
The operation will be described below with reference to the drawings.

まず、入力信号３６が、フーリエ変換部８と信号評価部１２に入力される。 First, the input signal 36 is input to the Fourier transform unit 8 and the signal evaluation unit 12.

フーリエ変換部８は、入力された現フレームの入力信号３６と必要に応じ前フレームの入力信号３６の最新部分を合わせた信号に対して、窓がけを行い、窓がけ後の信号に対してフーリエ変換処理を行うことで周波数毎のスペクトル成分を算出し、これを雑音抑圧部１９に出力する。なお、フーリエ変換処理および窓がけ処理については実施の形態１と同様である。 The Fourier transform unit 8 performs windowing on a signal obtained by combining the input signal 36 of the input current frame and the latest portion of the input signal 36 of the previous frame as necessary, and performs Fourier transform on the signal after the windowing. By performing the conversion process, a spectrum component for each frequency is calculated, and this is output to the noise suppression unit 19. The Fourier transform process and the windowing process are the same as in the first embodiment.

雑音抑圧部１９は、フーリエ変換部８より入力された周波数毎のスペクトル成分から、雑音抑圧部１９内部に格納してある推定雑音スペクトルを減算し、得られた結果を雑音抑圧スペクトル３７として重み付け加算部１８とスペクトル変形部３９内の振幅平滑化部９に出力する。これは、いわゆるスペクトルサブトラクション処理の主部に相当する処理である。そして、雑音抑圧部１９は、背景雑音区間であるか否かの判定を行い、背景雑音区間であればフーリエ変換部８より入力された周波数毎のスペクトル成分を用いて、内部の推定雑音スペクトルを更新する。なお、背景雑音区間であるか否かの判定は、後述する信号評価部１２の出力結果を流用して行うことで処理を簡易化することも可能である。 The noise suppression unit 19 subtracts the estimated noise spectrum stored inside the noise suppression unit 19 from the spectrum component for each frequency input from the Fourier transform unit 8, and weights and adds the obtained result as the noise suppression spectrum 37. The signal is output to the unit 18 and the amplitude smoothing unit 9 in the spectrum transformation unit 39. This is a process corresponding to a main part of a so-called spectral subtraction process. Then, the noise suppression unit 19 determines whether or not the noise is in the background noise interval. If the noise is in the background noise interval, the noise suppression unit 19 converts the internal estimated noise spectrum using the spectral component for each frequency input from the Fourier transform unit 8. Update. The determination as to whether or not it is in the background noise section can be simplified by making use of the output result of the signal evaluation unit 12 described later.

スペクトル変形部３９内の振幅平滑化部９は、雑音抑圧部１９より入力された雑音抑圧スペクトル３７の振幅成分に対して平滑化処理を行い、平滑化後の雑音抑圧スペクトルを位相擾乱部１０に出力する。ここで用いる平滑化処理としては、周波数軸方向、時間軸方向の何れを用いても、雑音抑圧部が発生させた劣化音の抑制効果が得られる。具体的な平滑化方法については実施の形態１と同様のものを用いることができる。 The amplitude smoothing unit 9 in the spectrum transformation unit 39 performs a smoothing process on the amplitude component of the noise suppression spectrum 37 input from the noise suppression unit 19, and sends the smoothed noise suppression spectrum to the phase disturbance unit 10. Output. Regardless of the frequency axis direction or the time axis direction used here, the effect of suppressing the degraded sound generated by the noise suppression unit can be obtained. As a specific smoothing method, a method similar to that in Embodiment 1 can be used.

スペクトル変形部３９内の位相擾乱部１０は、振幅平滑化部９から入力された平滑化後の雑音抑圧スペクトルの位相成分に擾乱を与え、擾乱後のスペクトルを変形雑音抑圧スペクトル３８として重み付き加算部１８に出力する。各位相成分に擾乱を与える方法については実施の形態１と同様のものを用いることができる。 The phase disturbance unit 10 in the spectrum transformation unit 39 disturbs the phase component of the smoothed noise suppression spectrum input from the amplitude smoothing unit 9, and adds the weighted spectrum as the modified noise suppression spectrum 38. Output to the unit 18. A method similar to that of the first embodiment can be used for giving a disturbance to each phase component.

信号評価部１２は、入力信号３６を分析して背景雑音らしさを算出し、これを加算制御値３５として重み付け加算部１８に出力する。なお、この信号評価部１２内の構成と各処理については、実施の形態１と同様のものを用いることができる。 The signal evaluation unit 12 analyzes the input signal 36 to calculate the likelihood of the background noise, and outputs this as the addition control value 35 to the weighting addition unit 18. The configuration and each process in the signal evaluation unit 12 can be the same as those in the first embodiment.

重み付き加算部１８は、信号評価部１２から入力された加算制御値３５に基づいて、雑音抑圧部１９から入力された雑音抑圧スペクトル３７とスペクトル変形部３９から入力された変形雑音抑圧スペクトル３８を重み付けして加算し、得られたスペクトルを逆フーリエ変換部１１に出力する。重み付け加算の制御方法の動作としては、実施の形態１と同様に、加算制御値３５が大きく（背景雑音らしさが高く）なるにつれて雑音抑圧スペクトル３７に対する重みを小さく、変形雑音抑圧スペクトル３８に対する重みを大きく制御する。逆に加算制御値３５が小さく（背景雑音らしさが低く）なるにつれて雑音抑圧スペクトル３７に対する重みを大きく、変形雑音抑圧スペクトル３８に対する重みを小さく制御する。 The weighted addition unit 18 converts the noise suppression spectrum 37 input from the noise suppression unit 19 and the modified noise suppression spectrum 38 input from the spectrum modification unit 39 based on the addition control value 35 input from the signal evaluation unit 12. The weighted and added spectrum is output to the inverse Fourier transform unit 11. As in the first embodiment, as the operation of the weighted addition control method, as the addition control value 35 increases (the likelihood of background noise increases), the weight for the noise suppression spectrum 37 decreases, and the weight for the modified noise suppression spectrum 38 decreases. Great control. Conversely, as the addition control value 35 becomes smaller (the lower the likelihood of the background noise), the weight for the noise suppression spectrum 37 is increased, and the weight for the modified noise suppression spectrum 38 is controlled to be smaller.

そして、最後の処理として、逆フーリエ変換部１１は、重み付き加算部１８から入力されたスペクトルに対して逆フーリエ変換処理を行うことで、信号領域に戻し、前後のフレームとの滑らかな連接のための窓がけを行いつつ連接していき、得られた信号を出力信号４０として出力する。連接のための窓がけと連接処理については、実施の形態１と同様である。 Then, as the last process, the inverse Fourier transform unit 11 performs an inverse Fourier transform process on the spectrum input from the weighted addition unit 18 to return the spectrum to the signal region, thereby forming a smooth connection with the previous and next frames. Are connected while performing windowing for the purpose, and an obtained signal is output as an output signal 40. The windowing and connection process for connection are the same as in the first embodiment.

この実施の形態２によれば、雑音抑圧処理等によって劣化したスペクトルに対して所定の加工処理を行うことで、劣化成分を主観的に気にならないようにした加工スペクトル（変形雑音抑圧スペクトル）を生成し、所定の評価値（背景雑音らしさ）によって加工前のスペクトルと加工スペクトルの加算重みを制御するようにしたので、劣化成分が多く含まれて主観品質の低下につながっている区間（背景雑音区間）を中心に加工スペクトルの比率を増やして、主観品質を改善できる効果がある。 According to the second embodiment, by performing a predetermined processing on the spectrum degraded by the noise suppression processing or the like, a processed spectrum (deformed noise suppression spectrum) in which the degraded component is not subjectively noticed can be obtained. Since the added weight of the spectrum before processing and the processed spectrum is controlled by a predetermined evaluation value (likelihood of background noise), a section (background noise) containing many degraded components and leading to a decrease in subjective quality is generated. The effect of improving the subjective quality is to increase the ratio of the processing spectrum around the section).

また、スペクトル領域での重み付け加算を行うようにしたので、実施の形態１に比べると加工処理のためのフーリエ変換と逆フーリエ変換が不要となり、処理が簡易になる効果がある。なお、この実施の形態２におけるフーリエ変換部８と逆フーリエ変換１１は、雑音抑圧部１９のために元々必要な構成である。 Further, since the weighted addition is performed in the spectral domain, the Fourier transform and the inverse Fourier transform for the processing are not required as compared with the first embodiment, and the processing is simplified. Note that the Fourier transform unit 8 and the inverse Fourier transform 11 in the second embodiment are originally necessary configurations for the noise suppressing unit 19.

また、加工処理として振幅スペクトル成分の平滑化処理と位相スペクトル成分の擾乱付与処理を行うようにしたので、量子化雑音などによって生じる振幅スペクトル成分の不安定な変動を良好に抑圧することができ、更に、位相成分間に独特な相互関係を持ってしまい特徴的な劣化と感じられることが多い量子化雑音や劣化成分に対して、位相成分間の関係に擾乱を与えることができ、主観品質を改善できる効果がある。 In addition, since the smoothing process of the amplitude spectrum component and the disturbance imparting process of the phase spectrum component are performed as the processing, the unstable fluctuation of the amplitude spectrum component caused by quantization noise or the like can be suppressed well. Furthermore, it is possible to disturb the relationship between the phase components with respect to the quantization noise and the deteriorated components, which often have a characteristic mutual relationship between the phase components and are felt as characteristic degradation, thereby improving the subjective quality. There is an effect that can be improved.

また、背景雑音区間であるか否かという２値区間判定ではなく、背景雑音らしさという連続量を算出して、これに基づいて連続的に重み付け加算係数を制御するようにしたので、区間判定誤りによる品質劣化を回避できる効果がある。 Also, instead of a binary section determination as to whether or not the section is a background noise section, a continuous amount called background noise likeness is calculated and the weighted addition coefficient is continuously controlled based on the calculated amount. This has the effect of avoiding quality degradation due to

また、背景雑音区間以外における劣化音が大きい場合には、図２（ｃ）のような重み付け加算を行うことで、確実に背景雑音区間以外と分かっている区間においても変形雑音抑圧スペクトルを加算し、劣化音を聞こえにくくすることができる効果がある。 When the degraded sound is large outside the background noise section, the weighted addition as shown in FIG. 2C is performed, so that the modified noise suppression spectrum is added even in the section that is surely known to be outside the background noise section. This has the effect of making it difficult to hear the degraded sound.

また、雑音抑圧スペクトルに対して、単純な処理を直接施して変形雑音抑圧スペクトルを生成しているので、雑音種やスペクトル形状にあまり依存しない安定な品質改善効果が得られる効果がある。 Further, since the modified noise suppression spectrum is generated by directly performing a simple process on the noise suppression spectrum, there is an effect that a stable quality improvement effect which is not so dependent on the noise type and the spectrum shape is obtained.

また、現在までの雑音抑圧スペクトルを用いて処理を行うので、雑音抑圧部１９の遅延時間に追加して、大きな遅延時間がいらない特長を持つ。変形雑音抑圧スペクトルの加算レベルをあげる際には元々の雑音抑圧スペクトルの加算レベルを下げていくので、量子化雑音を聞こえなくするために比較的大きな雑音を重畳することも不要で、背景雑音レベルを小さくすることができる効果がある。また、当然のことであるが、この処理を音声符号化処理の前処理などとして用いる場合にも、符号化部内に閉じた処理となるので従来のような新たな伝送情報の追加は不要である。 In addition, since the processing is performed using the noise suppression spectrum up to the present time, in addition to the delay time of the noise suppression unit 19, there is a feature that a large delay time is not required. When increasing the level of addition of the deformed noise suppression spectrum, the level of addition of the original noise suppression spectrum is reduced, so it is not necessary to superimpose relatively large noise to prevent quantization noise from being heard. There is an effect that can be reduced. Also, needless to say, even when this processing is used as a pre-processing of the audio coding processing, the processing is closed in the coding unit, so that addition of new transmission information as in the related art is unnecessary. .

実施の形態３．
図１との対応部分に同一符号を付けた図５は、本実施の形態による音信号加工方法を適用した音声復号装置の全体構成を示し、図中２０は信号変形部７の変形強度を制御する情報を出力する変形強度制御部である。変形強度制御部２０は、聴覚重み付け部２１、フーリエ変換部２２、レベル判定部２３、連続性判定部２４、変形強度算出部２５より構成されている。 Embodiment 3 FIG.
FIG. 5 in which the same reference numerals are assigned to the parts corresponding to those in FIG. 1 shows the overall configuration of a speech decoding apparatus to which the sound signal processing method according to the present embodiment is applied. This is a deformation strength control unit that outputs information to be performed. The deformation intensity control unit 20 includes an auditory weighting unit 21, a Fourier transform unit 22, a level determination unit 23, a continuity determination unit 24, and a deformation intensity calculation unit 25.

音声復号部４から出力された復号音声５が、信号加工部２内の信号変形部７、変形強度制御部２０、信号評価部１２、重み付き加算部１８に入力される。 The decoded speech 5 output from the speech decoding unit 4 is input to the signal transformation unit 7, the transformation strength control unit 20, the signal evaluation unit 12, and the weighted addition unit 18 in the signal processing unit 2.

変形強度制御部２０内の聴覚重み付け部２１は、音声復号部４より入力された復号音声５に対して、聴覚重み付け処理を行い、得られた聴覚重み付け音声をフーリエ変換部２２に出力する。ここで、聴覚重み付け処理としては、音声符号化処理（音声復号部４で行った音声復号処理と対を成すもの）で使用されているものと同様な処理を行う。 The perceptual weighting unit 21 in the deformation intensity control unit 20 performs perceptual weighting processing on the decoded voice 5 input from the voice decoding unit 4 and outputs the obtained perceptual weighted voice to the Fourier transform unit 22. Here, as the auditory weighting processing, the same processing as that used in the audio encoding processing (which forms a pair with the audio decoding processing performed by the audio decoding unit 4) is performed.

ＣＥＬＰなどの符号化処理で良く用いられる聴覚重み付け処理は、符号化対象の音声を分析して線形予測係数（ＬＰＣ）を算出し、これに定数乗算を行って２つの変形ＬＰＣを求め、この２つの変形ＬＰＣをフィルタ係数とするＡＲＭＡフィルタを構成し、このフィルタを用いたフィルタリング処理によって聴覚重み付けを行う、というものである。復号音声５に対して符号化処理と同様の聴覚重み付けを行うためには、受信した音声符号３を復号して得られたＬＰＣ、もしくは復号音声５を再分析して算出したＬＰＣを出発点として、２つの変形ＬＰＣを求め、これを用いて聴覚重み付けフィルタを構成すれば良い。 Perceptual weighting processing often used in coding processing such as CELP is to analyze a speech to be coded, calculate a linear prediction coefficient (LPC), and perform constant multiplication to obtain two modified LPCs. An ARMA filter using the two modified LPCs as filter coefficients is configured, and auditory weighting is performed by a filtering process using these filters. In order to perform the same auditory weighting on the decoded speech 5 as in the encoding process, an LPC obtained by decoding the received speech code 3 or an LPC calculated by re-analyzing the decoded speech 5 is used as a starting point. The two modified LPCs may be obtained and used to form an auditory weighting filter.

ＣＥＬＰなどの符号化処理では、聴覚重み付け後の音声上での歪みを最小化するように符号化を行うので、聴覚重み付け後の音声において、振幅が大きいスペクトル成分は、量子化雑音の重畳が少ない、ということになる。従って、符号化時の聴覚重み付け音声に近い音声を復号化部１内で生成できれば、信号変形部７における変形強度の制御情報として有用である。 In an encoding process such as CELP, encoding is performed so as to minimize distortion on the audio after the hearing weighting. In the audio after the hearing weighting, a spectral component having a large amplitude has a small amount of superposition of quantization noise. ,It turns out that. Therefore, if a speech close to the auditory weighting speech at the time of encoding can be generated in the decoding unit 1, it is useful as control information of the deformation intensity in the signal deformation unit 7.

なお、音声復号部４における音声復号処理にスペクトルポストフィルタなどの加工処理が含まれている場合（ＣＥＬＰの場合にはほとんどに含まれている）には、本来であればまず復号音声５からスペクトルポストフィルタなどの加工処理の影響を除去した音声を生成するか、音声復号部４内からこの加工処理直前の音声を抽出するかして、該音声に対して聴覚重み付けを行うことによって、符号化時の聴覚重み付け音声に近い音声が得られる。しかし、背景雑音区間の品質改善を主な目的とする場合には、この区間におけるスペクトルポストフィルタなどの加工処理の影響は少なく、その影響を除去しなくても効果に大差は出ない。この実施の形態３は、スペクトルポストフィルタなどの加工処理の影響除去を行わない構成としている。 If the speech decoding process in the speech decoding unit 4 includes processing such as a spectrum post-filter (which is mostly included in the case of CELP), the decoded speech 5 is first processed from the decoded speech 5 originally. The audio is subjected to auditory weighting by generating a sound from which the influence of the processing such as a post filter has been removed or extracting the sound immediately before the processing from the sound decoding unit 4. A sound close to the auditory weighting sound at the time is obtained. However, when the main purpose is to improve the quality of the background noise section, the influence of the processing such as the spectrum post filter in this section is small, and there is no great difference in the effect even if the influence is not removed. The third embodiment has a configuration in which the influence of processing such as a spectral post filter is not removed.

なお、当然のことであるが、符号化処理において聴覚重み付けを行っていない場合や、その効果が小さくて無視しても良い場合には、この聴覚重み付け部２１は不要となる。その場合、信号変形部７内のフーリエ変換部８の出力を、後述するレベル判定部２３と連続性判定部２４に与えればよいので、フーリエ変換部２２も不要とできる。 Needless to say, the auditory weighting unit 21 is not required when the auditory weighting is not performed in the encoding process or when the effect is small and can be ignored. In that case, the output of the Fourier transform unit 8 in the signal transforming unit 7 may be provided to the level determining unit 23 and the continuity determining unit 24, which will be described later, so that the Fourier transform unit 22 is not required.

更に、スペクトル領域でも非線型振幅変換処理など聴覚重み付けに近い効果をもたらす方法があるので、符号化処理内で使用している聴覚重み付け方法との誤差を無視して構わない場合には、信号変形部７内のフーリエ変換部８の出力をこの聴覚重み付け部２１への入力とし、聴覚重み付け部２１がこの入力に対してスペクトル領域での聴覚重み付けを行い、フーリエ変換部２２を省略して、後述するレベル判定部２３と連続性判定部２４に聴覚重み付けされたスペクトルを出力するように構成することも可能である。 Furthermore, since there is a method that provides an effect similar to auditory weighting in the spectral domain, such as non-linear amplitude conversion processing, if an error from the auditory weighting method used in the encoding processing can be ignored, signal deformation is performed. The output of the Fourier transform unit 8 in the unit 7 is used as an input to the auditory weighting unit 21. The auditory weighting unit 21 performs auditory weighting on this input in the spectral domain, and the Fourier transform unit 22 is omitted. It is also possible to output an auditory weighted spectrum to the level judging unit 23 and the continuity judging unit 24 that perform the judgment.

変形強度制御部２０内のフーリエ変換部２２は、聴覚重み付け部２１より入力された聴覚重み付け音声と必要に応じ前フレームの聴覚重み付け音声の最新部分を合わせた信号に対して、窓がけを行い、窓がけ後の信号に対してフーリエ変換処理を行うことで周波数毎のスペクトル成分を算出し、これを聴覚重み付けスペクトルとしてレベル判定部２３と連続性判定部２４に出力する。なお、フーリエ変換処理および窓がけ処理については実施の形態１のフーリエ変換部８と同様である。 The Fourier transform unit 22 in the deformation intensity control unit 20 performs windowing on a signal obtained by combining the auditory weighted sound input from the auditory weighting unit 21 and the latest part of the auditory weighted audio of the previous frame as necessary, By performing a Fourier transform process on the signal after windowing, a spectrum component for each frequency is calculated, and this is output to the level determination unit 23 and the continuity determination unit 24 as an auditory weighting spectrum. The Fourier transform process and the windowing process are the same as the Fourier transform unit 8 of the first embodiment.

レベル判定部２３は、フーリエ変換部２２から入力された聴覚重み付けスペクトルの各振幅成分の値の大きさに基づいて、各周波数毎の第一の変形強度を算出し、これを変形強度算出部２５に出力する。聴覚重み付けスペクトルの各振幅成分の値が小さい程量子化雑音の比率が大きいので、第一の変形強度を強くすればよい。最も単純には、全振幅成分の平均値を求めて、この平均値に所定の閾値Ｔｈを加算して、これを上回る成分に対しては第一の変形強度を０、これを下回る成分に対しては第一の変形強度を１とすればよい。図６には、この閾値Ｔｈを用いた場合の聴覚重み付けスペクトルと第一の変形強度の関係を示す。なお、第一の変形強度の算出方法はこれに限定されるものではない。 The level determination unit 23 calculates a first deformation intensity for each frequency based on the magnitude of each amplitude component of the auditory weighting spectrum input from the Fourier transform unit 22, and calculates the first deformation intensity for each frequency. Output to Since the smaller the value of each amplitude component of the auditory weighting spectrum is, the larger the ratio of the quantization noise is, the first deformation strength may be increased. In the simplest case, the average value of all the amplitude components is obtained, and a predetermined threshold value Th is added to the average value. In this case, the first deformation strength may be set to 1. FIG. 6 shows the relationship between the auditory weighting spectrum and the first deformation intensity when this threshold Th is used. The method of calculating the first deformation strength is not limited to this.

連続性判定部２４は、フーリエ変換部２２から入力された聴覚重み付けスペクトルの各振幅成分または各位相成分の時間方向の連続性を評価し、この評価結果に基づいて、各周波数毎の第二の変形強度を算出し、これを変形強度算出部２５に出力する。聴覚重み付けスペクトルの振幅成分の時間方向の連続性、位相成分の（フレーム間の時間推移による位相の回転を補償した後の）連続性が低い周波数成分については、良好な符号化が行われていたとは考えにくいので、第二の変形強度を強くする。この第二の変形強度の算出についても、最も単純には所定の閾値を用いた判定によって０または１を与える方法を用いることができる。 The continuity determining unit 24 evaluates the continuity of each amplitude component or each phase component of the auditory weighting spectrum input from the Fourier transform unit 22 in the time direction, and, based on the evaluation result, a second continuity for each frequency. The deformation strength is calculated, and this is output to the deformation strength calculation unit 25. It is said that good encoding was performed for the frequency component having low continuity in the time direction of the amplitude component of the auditory weighting spectrum and low continuity of the phase component (after compensating the rotation of the phase due to the time transition between frames). Is hard to imagine, so the second deformation strength is increased. For the calculation of the second deformation strength, a method of giving 0 or 1 by a determination using a predetermined threshold value can be used in the simplest manner.

変形強度算出部２５は、レベル判定部２３より入力された第一の変形強度と、連続性判定部２４より入力された第二の変形強度に基づいて、各周波数毎の最終的な変形強度を算出し、これを信号変形部７内の振幅平滑化部９と位相擾乱部１０に出力する。この最終的な変形強度については、第一の変形強度と第二の変形強度の最小値、重み付き平均値、最大値などを用いることができる。以上でこの実施の形態３にて新たに加わった変形強度制御部２０の動作の説明を終了する。 The deformation strength calculation unit 25 calculates the final deformation strength for each frequency based on the first deformation strength input from the level determination unit 23 and the second deformation strength input from the continuity determination unit 24. The calculated value is output to the amplitude smoothing unit 9 and the phase disturbance unit 10 in the signal transformation unit 7. As the final deformation strength, a minimum value, a weighted average value, a maximum value, and the like of the first deformation strength and the second deformation strength can be used. This is the end of the description of the operation of the deformation strength control unit 20 newly added in the third embodiment.

次に、この変形強度制御部２０の追加に伴って、動作に変更がある構成要素について説明する。 Next, a description will be given of components whose operations are changed with the addition of the deformation strength control unit 20.

振幅平滑化部９は、変形強度制御部２０より入力された変形強度に従い、フーリエ変換部８から入力された周波数毎のスペクトルの振幅成分に対して平滑化処理を行い、平滑化後のスペクトルを位相擾乱部１０に出力する。なお、変形強度が強い周波数成分程、平滑化を強めるように制御する。平滑化強度の強さを制御する最も単純な方法は、入力された変形強度が大きいときにのみ平滑化を行うようにすればよい。この他にも平滑化を強める方法としては、実施の形態１で説明した平滑化の数式における平滑化係数αを小さくしたり、固定的な平滑化を行った後のスペクトルと平滑化前のスペクトルを重み付き加算して最終的なスペクトルを生成するように構成しておき、平滑化前のスペクトルに対する重みを小さくするなど様々な方法を用いることができる。 The amplitude smoothing unit 9 performs a smoothing process on the amplitude component of the spectrum for each frequency input from the Fourier transform unit 8 according to the deformation intensity input from the deformation intensity control unit 20, and converts the smoothed spectrum. Output to the phase disturbance unit 10. It should be noted that the control is performed so that the smoothing is enhanced as the frequency component has a higher deformation intensity. The simplest method of controlling the strength of the smoothing strength is to perform smoothing only when the input deformation strength is large. Other methods of enhancing the smoothing include reducing the smoothing coefficient α in the smoothing equation described in the first embodiment, or changing the spectrum after the fixed smoothing and the spectrum before the smoothing. Are weighted and added to generate a final spectrum, and various methods such as reducing the weight of the spectrum before smoothing can be used.

位相擾乱部１０は、変形強度制御部２０より入力された変形強度に従い、振幅平滑化部９から入力された平滑化後のスペクトルの位相成分に擾乱を与え、擾乱後のスペクトルを逆フーリエ変換部１１に出力する。なお、変形強度が強い周波数成分程、位相の擾乱を大きく与えるように制御する。擾乱の大きさを制御する最も単純な方法は、入力された変形強度が大きいときにのみ擾乱を与えるようにすればよい。この他にも擾乱を制御する方法としては、乱数で生成する位相角の範囲を大小させるなど様々な方法を用いることができる。 The phase disturbance unit 10 disturbs the phase component of the smoothed spectrum input from the amplitude smoothing unit 9 according to the deformation intensity input from the deformation intensity control unit 20, and converts the disturbed spectrum into an inverse Fourier transform unit 11 is output. It should be noted that the control is performed such that the greater the frequency component having the higher deformation intensity, the greater the phase disturbance. The simplest method of controlling the magnitude of the disturbance is to apply the disturbance only when the input deformation intensity is high. In addition to the above, various methods for controlling the disturbance, such as increasing or decreasing the range of the phase angle generated by random numbers, can be used.

その他の構成要素については、実施の形態１と同様であるため説明を省略する。 The other components are the same as those in the first embodiment, and a description thereof will not be repeated.

なお、ここでは、レベル判定部２３と連続性判定部２４の両方の出力結果を使用したが、一方だけを使用するようにして、残るもう一方は省略する構成も可能である。また、変形強度によって制御する対象を、振幅平滑化部９と位相擾乱部１０の一方のみとする構成でも構わない。 Here, the output results of both the level determination unit 23 and the continuity determination unit 24 are used, but a configuration is possible in which only one is used and the other is omitted. Further, a configuration in which only one of the amplitude smoothing unit 9 and the phase disturbance unit 10 is controlled by the deformation intensity may be employed.

この実施の形態３によれば、入力信号（復号音声）または聴覚重み付けされた入力信号（復号音声）の各周波数成分毎の振幅の大きさ、各周波数毎の振幅や位相の連続性の大きさに基づいて、加工信号（変形復号音声）を生成する際の変形強度を周波数毎に制御するようにしたので、実施の形態１が持つ効果に加えて、前記振幅スペクトル成分が小さいために量子化雑音や劣化成分が支配的になっている成分、スペクトル成分の連続性が低いために量子化雑音や劣化成分が多くなりがちな成分に対して重点的に加工が加えられ、量子化雑音や劣化成分が少ない良好な成分まで加工してしまうことがなくなり、入力信号や実際の背景雑音の特性を比較的良好に残しつつ量子化雑音や劣化成分を主観的に抑圧でき、主観品質を改善できる効果がある。 According to the third embodiment, the magnitude of the amplitude of each frequency component of the input signal (decoded speech) or the input signal (decoded speech) weighted by the auditory sense, and the magnitude of the continuity of the amplitude and phase of each frequency Is used to control the deformation intensity at the time of generating a processed signal (deformed decoded voice) for each frequency. In addition to the effects of the first embodiment, quantization is performed because the amplitude spectrum component is small. The noise and degraded components are dominant, and the continuity of the spectral components is low, so the quantization noise and degraded components are likely to be increased. Eliminates processing of good components with few components.Effects of subjectively suppressing quantization noise and degraded components while maintaining relatively good characteristics of input signals and actual background noise, thereby improving subjective quality. There .

実施の形態４．
図５との対応部分に同一符号を付けた図７は、本実施の形態による音信号加工方法を適用した音声復号装置の全体構成を示し、図中４１は加算制御値分割部であり、図５における信号変形部７の部分は、フーリエ変換部８、スペクトル変形部３９、逆フーリエ変換部１１の構成に変更している。 Embodiment 4 FIG.
FIG. 7 in which parts corresponding to those in FIG. 5 are assigned the same reference numerals shows the overall configuration of a speech decoding apparatus to which the sound signal processing method according to the present embodiment is applied. In FIG. 7, reference numeral 41 denotes an addition control value dividing unit. 5, the configuration of the signal transformation unit 7 is changed to a Fourier transform unit 8, a spectrum transformation unit 39, and an inverse Fourier transform unit 11.

音声復号部４から出力された復号音声５は、信号加工部２内のフーリエ変換部８、変形強度制御部２０、信号評価部１２に入力される。 The decoded voice 5 output from the voice decoding unit 4 is input to the Fourier transform unit 8, the deformation intensity control unit 20, and the signal evaluation unit 12 in the signal processing unit 2.

フーリエ変換部８は、実施の形態２と同様にして、入力された現フレームの復号音声５と必要に応じ前フレームの復号音声５の最新部分を合わせた信号に対して、窓がけを行い、窓がけ後の信号に対してフーリエ変換処理を行うことで周波数毎のスペクトル成分を算出し、これを復号音声スペクトル４３として重み付き加算部１８とスペクトル変形部３９内の振幅平滑化部９に出力する。 The Fourier transform unit 8 performs windowing on the signal obtained by combining the input decoded voice 5 of the current frame and the latest part of the decoded voice 5 of the previous frame as necessary, as in the second embodiment. A spectrum component for each frequency is calculated by performing a Fourier transform process on the signal after windowing, and this is output as a decoded speech spectrum 43 to the weighted addition unit 18 and the amplitude smoothing unit 9 in the spectrum transformation unit 39. I do.

スペクトル変形部３９は、実施の形態２と同様にして、入力された復号音声スペクトル４３に対して、振幅平滑化部９、位相擾乱部１０の処理を順に行い、得られたスペクトルを変形復号音声スペクトル４４として、重み付き加算部１８に出力する。 The spectrum deforming unit 39 performs the processing of the amplitude smoothing unit 9 and the phase disturbance unit 10 on the input decoded speech spectrum 43 in the same manner as in the second embodiment, and converts the obtained spectrum into the modified decoded speech. The spectrum 44 is output to the weighted addition unit 18.

変形強度制御部２０内では、実施の形態３と同様に、入力された復号音声５に対して、聴覚重み付け部２１、フーリエ変換部２２、レベル判定部２３、連続性判定部２４、変形強度算出部２５の処理を順次行い、得られた周波数毎の変形強度を加算制御値分割部４１に出力する。 In the deformation intensity control unit 20, as in the third embodiment, the auditory weighting unit 21, the Fourier transform unit 22, the level determination unit 23, the continuity determination unit 24, the deformation intensity calculation The processing of the unit 25 is sequentially performed, and the obtained deformation intensity for each frequency is output to the addition control value dividing unit 41.

なお、実施の形態３と同様に、符号化処理において聴覚重み付けを行っていない場合や、その効果が小さい場合には、聴覚重み付け部２１とフーリエ変換部２２は不要となる。その場合、フーリエ変換部８の出力を、レベル判定部２３と連続性判定部２４に与えればよい。 Note that, as in the third embodiment, when the auditory weighting is not performed in the encoding process or when the effect is small, the auditory weighting unit 21 and the Fourier transform unit 22 become unnecessary. In that case, the output of the Fourier transform unit 8 may be provided to the level determination unit 23 and the continuity determination unit 24.

また、フーリエ変換部８の出力をこの聴覚重み付け部２１への入力とし、聴覚重み付け部２１がこの入力に対してスペクトル領域での聴覚重み付けを行い、フーリエ変換部２２を省略して、後述するレベル判定部２３と連続性判定部２４に聴覚重み付けされたスペクトルを出力するように構成することも可能である。この様に構成することで、処理の簡易化効果が得られる。 The output of the Fourier transform unit 8 is used as an input to the auditory weighting unit 21. The auditory weighting unit 21 performs auditory weighting on this input in the spectral domain. It is also possible to configure so that the auditory weighted spectrum is output to the determination unit 23 and the continuity determination unit 24. With such a configuration, an effect of simplifying the processing can be obtained.

信号評価部１２は、実施の形態１と同様に、入力された復号音声５に対して、背景雑音らしさを求めて、これを加算制御値３５として加算制御値分割部４１に出力する。 As in the first embodiment, the signal evaluation unit 12 obtains the likelihood of background noise from the input decoded speech 5 and outputs this as an addition control value 35 to the addition control value division unit 41.

新たに加えられた加算制御値分割部４１は、変形強度制御部２０から入力された周波数毎の変形強度と、信号評価部１２から入力された加算制御値３５を用いて、周波数毎の加算制御値４２を生成し、これを重み付き加算部１８に出力する。変形強度が強い周波数については、その周波数の加算制御値４２の値を制御して、重み付き加算部１８における復号音声スペクトル４３の重みを弱く、変形復号音声スペクトル４４の重みを強くする。逆に変形強度が弱い周波数については、その周波数の加算制御値４２の値を制御して、重み付き加算部１８における復号音声スペクトル４３の重みを強く、変形復号音声スペクトル４４の重みを弱くする。つまり、変形強度が強い周波数については、背景雑音らしさが高いわけであるので、その周波数の加算制御値４２を大きくし、逆の場合には、小さくするわけである。 The newly added addition control value division unit 41 uses the deformation intensity for each frequency input from the deformation intensity control unit 20 and the addition control value 35 input from the signal evaluation unit 12 to perform addition control for each frequency. A value 42 is generated and output to the weighted addition unit 18. For a frequency having a high deformation intensity, the value of the addition control value 42 of the frequency is controlled so that the weight of the decoded voice spectrum 43 in the weighted addition section 18 is weakened and the weight of the deformed decoded voice spectrum 44 is strong. Conversely, for a frequency having a low deformation intensity, the value of the addition control value 42 of that frequency is controlled so that the weight of the decoded voice spectrum 43 in the weighted addition unit 18 is increased and the weight of the deformed decoded voice spectrum 44 is reduced. That is, for a frequency having a high deformation intensity, the likelihood of the background noise is high. Therefore, the addition control value 42 of the frequency is increased, and in the opposite case, it is decreased.

重み付き加算部１８は、加算制御値分割部４１から入力された周波数毎の加算制御値４２に基づいて、フーリエ変換部８から入力された復号音声スペクトル４３とスペクトル変形部３９から入力された変形復号音声スペクトル４４を重み付けして加算し、得られたスペクトルを逆フーリエ変換部１１に出力する。重み付け加算の制御方法の動作としては、図２にて説明したのと同様に、周波数毎の加算制御値４２が大きい（背景雑音らしさが高い）周波数成分に対しては復号音声スペクトル４３に対する重みを小さく、変形復号音声スペクトル４４に対する重みを大きく制御する。逆に周波数毎の加算制御値４２が小さい（背景雑音らしさが低い）周波数成分に対しては復号音声スペクトル４３に対する重みを大きく、変形復号音声スペクトル４４に対する重みを小さく制御する。 The weighted adder 18 converts the decoded speech spectrum 43 input from the Fourier transformer 8 and the deformation input from the spectrum converter 39 based on the addition control value 42 for each frequency input from the addition control value divider 41. The decoded speech spectrum 44 is weighted and added, and the obtained spectrum is output to the inverse Fourier transform unit 11. The operation of the weighted addition control method is similar to that described with reference to FIG. 2, in which a weight for the decoded speech spectrum 43 is assigned to a frequency component having a large addition control value 42 for each frequency (high background noise likeness). The weight for the modified decoded speech spectrum 44 is controlled to be large and small. Conversely, for a frequency component in which the addition control value 42 for each frequency is small (the likelihood of background noise is low), the weight for the decoded speech spectrum 43 is increased, and the weight for the modified decoded speech spectrum 44 is controlled to be small.

そして、最後の処理として、逆フーリエ変換部１１は、実施の形態２と同様にして、重み付き加算部１８から入力されたスペクトルに対して逆フーリエ変換処理を行うことで、信号領域に戻し、前後のフレームとの滑らかな連接のための窓がけを行いつつ連接していき、得られた信号を出力音声６として出力する。 Then, as the last process, the inverse Fourier transform unit 11 performs an inverse Fourier transform process on the spectrum input from the weighted addition unit 18 in the same manner as in Embodiment 2, thereby returning the spectrum to the signal domain. The connection is performed while performing windowing for smooth connection with the preceding and succeeding frames, and the obtained signal is output as output sound 6.

なお、加算制御値分割部４１を廃して、信号評価部１２の出力を重み付き加算部１８に与え、変形強度制御部２０の出力である変形強度を振幅平滑化部９と位相擾乱部１０に与える構成も可能である。この様にしたものは、実施の形態３の構成における重み付き加算処理をスペクトル領域で行うようにしたものに相当する。 Note that the addition control value division unit 41 is omitted, the output of the signal evaluation unit 12 is given to the weighted addition unit 18, and the deformation intensity output from the deformation intensity control unit 20 is sent to the amplitude smoothing unit 9 and the phase disturbance unit 10. A configuration for providing is also possible. Such a configuration corresponds to a configuration in which the weighted addition process in the configuration of the third embodiment is performed in the spectral domain.

更に、実施の形態３の場合と同様に、レベル判定部２３と連続性判定部２４の一方だけを使用するようにして、残るもう一方は省略する構成も可能である。
この実施の形態４によれば、入力信号（復号音声）または聴覚重み付けされた入力信号（復号音声）の各周波数成分毎の振幅の大きさ、各周波数毎の振幅や位相の連続性の大きさに基づいて、入力信号のスペクトル（復号音声スペクトル）と加工スペクトル（変形復号音声スペクトル）の重み付け加算を周波数成分毎に独立に制御するようにしたので、実施の形態１が持つ効果に加えて、前記振幅スペクトル成分が小さいために量子化雑音や劣化成分が支配的になっている成分、スペクトル成分の連続性が低いために量子化雑音や劣化成分が多くなりがちな成分に対して重点的に加工スペクトルの重みを強め、量子化雑音や劣化成分が少ない良好な成分まで加工スペクトルの重みを強めてしまうことがなくなり、入力信号や実際の背景雑音の特性を比較的良好に残しつつ量子化雑音や劣化成分を主観的に抑圧でき、主観品質を改善できる効果がある。 Further, similarly to the case of the third embodiment, a configuration is possible in which only one of the level determination unit 23 and the continuity determination unit 24 is used, and the other is omitted.
According to the fourth embodiment, the magnitude of the amplitude of each frequency component of the input signal (decoded speech) or the input signal (decoded speech) weighted by auditory sense, and the magnitude of the continuity of the amplitude and phase of each frequency , The weighted addition of the spectrum of the input signal (decoded speech spectrum) and the processed spectrum (deformed decoded speech spectrum) are controlled independently for each frequency component. In addition to the effects of the first embodiment, The components in which the quantization noise and the degraded components are dominant because the amplitude spectrum components are small, and the components in which the quantization noise and the degraded components tend to be increased due to the low continuity of the spectral components. By increasing the weight of the processing spectrum, the weight of the processing spectrum will not be strengthened even to a good component with less quantization noise and degraded components. While leaving sex relatively good can subjectively suppressed quantization noise or the degraded component, there is an effect of improving the subjective quality.

実施の形態３と比較すると、平滑化と擾乱という２つの周波数毎の変形処理から、１つの周波数毎の変形処理に変わっており、処理が簡易化される効果がある。 Compared with the third embodiment, the transformation processing for each frequency, namely, smoothing and disturbance, is changed to the transformation processing for each frequency, which has the effect of simplifying the processing.

実施の形態５．
図５との対応部分に同一符号を付けた図８は、本実施の形態による音信号加工方法を適用した音声復号装置の全体構成を示し、図中２６は背景雑音らしさ（加算制御値３５）の時間方向の変動性を判定する変動性判定部である。 Embodiment 5 FIG.
FIG. 8 in which the same reference numerals are assigned to parts corresponding to those in FIG. 5 shows the entire configuration of a speech decoding apparatus to which the sound signal processing method according to the present embodiment is applied. In the figure, reference numeral 26 denotes background noise likeness (addition control value 35). Is a variability determination unit that determines the variability in the time direction of.

音声復号部４から出力された復号音声５が、信号加工部２内の信号変形部７、変形強度制御部２０、信号評価部１２、重み付き加算部１８に入力される。信号評価部１２は、入力された復号音声５に対して、背景雑音らしさを評価し、評価結果を加算制御値３５として、変動性判定部２６と重み付き加算部１８に出力する。 The decoded speech 5 output from the speech decoding unit 4 is input to the signal transformation unit 7, the transformation strength control unit 20, the signal evaluation unit 12, and the weighted addition unit 18 in the signal processing unit 2. The signal evaluation unit 12 evaluates the likelihood of background noise with respect to the input decoded speech 5 and outputs the evaluation result as an addition control value 35 to the variability determination unit 26 and the weighted addition unit 18.

変動性判定部２６は、信号評価部１２より入力された加算制御値３５を、その内部に格納している過去の加算制御値３５と比較し、値の時間方向の変動性が高いか否かを判定し、この判定結果に基づいて第三の変形強度を算出し、これを変形強度制御部２０内の変形強度算出部２５に出力する。そして、入力された加算制御値３５を用いて内部に格納している過去の加算制御値３５を更新する。
加算制御値３５などのフレーム（またはサブフレーム）の特性を表すパラメータの時間方向の変動性が高い場合には、復号音声５のスペクトルが時間方向に大きく変化している場合が多く、必要以上に強い振幅平滑化や位相擾乱付与を行うと不自然な反響感が発生してしまう。そこで、この第三の変形強度は、加算制御値３５の時間方向の変動性が高い場合には、振幅平滑化部９における平滑化と位相擾乱部１０における擾乱付与が弱くなるように設定する。なお、フレーム（またはサブフレーム）の特性を表すパラメータであれば、復号音声のパワー、スペクトル包絡パラメータなど、加算制御値３５以外のパラメータを用いても同様の効果を得ることができる。 The variability determination unit 26 compares the addition control value 35 input from the signal evaluation unit 12 with a past addition control value 35 stored therein, and determines whether the variability of the value in the time direction is high. Is determined, and a third deformation strength is calculated based on the determination result, and this is output to the deformation strength calculation unit 25 in the deformation strength control unit 20. Then, the past addition control value 35 stored inside is updated using the input addition control value 35.
If the time direction variability of a parameter representing the characteristics of a frame (or a subframe) such as the addition control value 35 is high, the spectrum of the decoded speech 5 often changes greatly in the time direction, which is more than necessary. When strong amplitude smoothing or phase disturbance is applied, an unnatural reverberation is generated. Therefore, the third deformation strength is set such that when the variability in the time direction of the addition control value 35 is high, the smoothing in the amplitude smoothing unit 9 and the disturbance application in the phase disturbance unit 10 are weak. Note that the same effect can be obtained by using parameters other than the addition control value 35, such as the power of the decoded speech and the spectrum envelope parameter, as long as the parameters represent the characteristics of the frame (or the subframe).

変動性の判定方法としては、最も単純には、前フレームの加算制御値３５との差分の絶対値を所定の閾値と比較して、閾値を上回っていれば変動性が高い、とすれば良い。この他、前フレームおよび前々フレームの加算制御値３５との差分の絶対値を各々算出して、その一方が所定の閾値を上回っているか否かで判定してもよい。また、信号評価部１２がサブフレーム毎に加算制御値３５を算出する場合には、現在のフレーム内または必要に応じて前フレーム内の全サブフレーム間の加算制御値３５の差分の絶対値を求めて、何れかが所定の閾値を上回っているか否かで判定することもできる。そして、具体的な処理例としては、閾値を上回っていれば第三の変形強度を０、閾値を下回っていれば第三の変形強度を１とする。 The simplest method of determining the variability is to compare the absolute value of the difference from the addition control value 35 of the previous frame with a predetermined threshold, and determine that the variability is high if the absolute value exceeds the threshold. . Alternatively, the absolute value of the difference between the addition control value 35 of the previous frame and the two frames before the previous frame may be calculated, and the determination may be made based on whether one of them exceeds a predetermined threshold. When the signal evaluation unit 12 calculates the addition control value 35 for each subframe, the signal evaluation unit 12 calculates the absolute value of the difference between the addition control values 35 between all the subframes in the current frame or, if necessary, in the previous frame. In this case, it can be determined whether or not any of them exceeds a predetermined threshold. Then, as a specific processing example, the third deformation intensity is set to 0 when the value exceeds the threshold value, and the third deformation intensity is set to 1 when the value is lower than the threshold value.

変形強度制御部２０内では、入力された復号音声５に対して、聴覚重み付け部２１、フーリエ変換部２２、レベル判定部２３、連続性判定部２４までは、実施の形態３と同様な処理を行う。 In the deformation strength control unit 20, the same processing as in the third embodiment is performed on the input decoded speech 5 up to the auditory weighting unit 21, the Fourier transform unit 22, the level determination unit 23, and the continuity determination unit 24. Do.

そして、変形強度算出部２５では、レベル判定部２３より入力された第一の変形強度、連続性判定部２４より入力された第二の変形強度、変動性判定部２６より入力された第三の変形強度に基づいて、各周波数毎の最終的な変形強度を算出し、これを信号変形部７内の振幅平滑化部９と位相擾乱部１０に出力する。この最終的な変形強度の算出方法としては、第三の変形強度を全周波数に対して一定値として与え、周波数毎にこの全周波数に拡張した第三の変形強度、第一の変形強度、第二の変形強度の最小値、重み付き平均値、最大値などを求めて最終的な変形強度とする、という方法を用いることができる。 Then, in the deformation strength calculation unit 25, the first deformation strength input from the level determination unit 23, the second deformation strength input from the continuity determination unit 24, and the third deformation intensity input from the variability determination unit 26 Based on the deformation intensity, a final deformation intensity for each frequency is calculated and output to the amplitude smoothing unit 9 and the phase disturbance unit 10 in the signal deformation unit 7. As a method of calculating the final deformation strength, the third deformation strength is given as a constant value for all frequencies, and the third deformation strength, first deformation strength, A method of obtaining a minimum value, a weighted average value, a maximum value, and the like of the second deformation strength to obtain a final deformation strength can be used.

以降の信号変形部７、重み付き加算部１８の動作は、実施の形態３と同様であり、説明を省略する。 The subsequent operations of the signal transformation unit 7 and the weighted addition unit 18 are the same as in the third embodiment, and a description thereof will be omitted.

なお、ここでは、レベル判定部２３と連続性判定部２４の両方の出力結果を使用したが、一方だけを使用するようにしたり、両方とも使用しない構成も可能である。また、変形強度によって制御する対象を、振幅平滑化部９と位相擾乱部１０の一方のみとしたり、第三の変形強度については一方のみを制御対象とする構成でも構わない。 Here, although the output results of both the level determination unit 23 and the continuity determination unit 24 are used, a configuration in which only one of them is used or both are not used is also possible. Further, a configuration in which only one of the amplitude smoothing unit 9 and the phase disturbance unit 10 is controlled by the deformation intensity or a configuration in which only one of the third deformation intensity is controlled may be used.

この実施の形態５によれば、実施の形態３の構成に加えて、平滑化強度または擾乱付与強度を、所定の評価値（背景雑音らしさ）の時間変動性（フレームまたはサブフレーム間の変動性）の大きさによって制御するようにしたので、実施の形態３が持つ効果に加えて、入力信号（復号音声）の特性が変動している区間において必要以上に強い加工処理を抑止でき、なまけ、エコー（反響感）の発生を防止できる効果がある。 According to the fifth embodiment, in addition to the configuration of the third embodiment, the smoothing strength or the disturbance imparting strength is determined by changing the temporal variability (variability between frames or subframes) of a predetermined evaluation value (likelihood of background noise). ), It is possible to suppress unnecessarily strong processing in a section where the characteristics of the input signal (decoded voice) fluctuate, in addition to the effects of the third embodiment. This has the effect of preventing the occurrence of echo.

実施の形態６．
図５との対応部分に同一符号を付けた図９は、本実施の形態による音信号加工方法を適用した音声復号装置の全体構成を示す。図中２７は摩擦音らしさ評価部、３１は背景雑音らしさ評価部、４５は加算制御値算出部である。摩擦音らしさ評価部２７は、低域カットフィルタ２８、零交差数カウント部２９、摩擦音らしさ算出部３０より構成される。背景雑音らしさ評価部３１は、図５における信号評価部１２と同じ構成であり、逆フィルタ部１３、パワー算出部１４、背景雑音らしさ算出部１５、推定雑音パワー更新部１６、推定雑音スペクトル更新部１７より構成される。信号評価部１２は、図５の場合と異なり、摩擦音らしさ評価部２７、背景雑音らしさ評価部３１、加算制御値算出部４５より構成される。 Embodiment 6 FIG.
FIG. 9 in which parts corresponding to those in FIG. 5 are assigned the same reference numerals shows the overall configuration of a speech decoding apparatus to which the sound signal processing method according to the present embodiment is applied. In the figure, reference numeral 27 denotes a friction noise likeness evaluation unit, 31 denotes a background noise likeness evaluation unit, and 45 denotes an addition control value calculation unit. The fricative sound likeness evaluation unit 27 includes a low frequency cut filter 28, a zero-crossing number counting unit 29, and a fricative sound likeness calculating unit 30. The background noise likeness evaluation unit 31 has the same configuration as the signal evaluation unit 12 in FIG. 5, and includes an inverse filter unit 13, a power calculation unit 14, a background noise likeness calculation unit 15, an estimated noise power update unit 16, and an estimated noise spectrum update unit. 17. The signal evaluation unit 12 is different from the case of FIG. 5 and includes a friction noise likeness evaluation unit 27, a background noise likeness evaluation unit 31, and an addition control value calculation unit 45.

音声復号部４から出力された復号音声５が、信号加工部２内の信号変形部７、変形強度制御部２０、信号評価部１２内の摩擦音らしさ評価部２７と背景雑音らしさ評価部３１、そして重み付き加算部１８に入力される。 The decoded speech 5 output from the speech decoding unit 4 is output to the signal transformation unit 7, the deformation intensity control unit 20 in the signal processing unit 2, the fricative noise likeness evaluation unit 27 and the background noise likeness evaluation unit 31 in the signal evaluation unit 12, and It is input to the weighted addition unit 18.

信号評価部１２内の背景雑音らしさ評価部３１は、実施の形態３における信号評価部１２と同様に、入力された復号音声５に対して、逆フィルタ部１３、パワー算出部１４、背景雑音らしさ算出部１５の処理を行って、得られた背景雑音らしさ４６を加算制御値算出部４５に出力する。また、推定雑音パワー更新部１６、推定雑音スペクトル更新部１７の処理を行って、各々に格納してある推定雑音パワーと推定雑音スペクトルの更新を行う。 Like the signal evaluation unit 12 in the third embodiment, the background noise likeness evaluation unit 31 in the signal evaluation unit 12 applies the inverse filter unit 13, the power calculation unit 14, and the background noise likeness to the input decoded speech 5. The processing of the calculation unit 15 is performed, and the obtained background noise likeness 46 is output to the addition control value calculation unit 45. Further, the processing of the estimated noise power updating unit 16 and the estimated noise spectrum updating unit 17 is performed to update the estimated noise power and the estimated noise spectrum stored in each of them.

摩擦音らしさ評価部２７内の低域カットフィルタ２８は、入力された復号音声５に対して低周波数成分を抑圧する低域カットフィルタリング処理を行い、フィルタリング後の復号音声を零交差数カウント部２９に出力する。この低域カットフィルタリング処理の目的は、復号音声に含まれる直流成分や低周波数の成分がオッフセットとなって、後述する零交差数カウント部２９のカウント結果が少なくなることを防止することである。従って、単純には、フレーム内の復号音声５の平均値を算出し、これを復号音声５の各サンプルから減算することでもよい。 The low-frequency cut filter 28 in the fricative soundness evaluation section 27 performs low-frequency cut filtering processing on the input decoded voice 5 to suppress low frequency components, and outputs the filtered decoded voice to the zero-crossing number counting section 29. Output. The purpose of the low-frequency cut filtering processing is to prevent a DC component or a low-frequency component included in the decoded voice from becoming an offset and reducing the count result of the zero-crossing number counting unit 29 described later. Therefore, simply, the average value of the decoded speech 5 in the frame may be calculated and subtracted from each sample of the decoded speech 5.

零交差数カウント部２９は、低域カットフィルタ２８より入力された音声を分析して、含まれる零交差数を数え上げ、得られた零交差数を摩擦音らしさ算出部３０に出力する。零交差数を数え上げる方法としては、隣接サンプルの正負を比較し、同一でなければ零を交差している、としてカウントする方法、隣接サンプルの値の積をとって、その結果が負または零であれば零を交差している、としてカウントする方法などがある。 The number-of-zero-crossings counting unit 29 analyzes the voice input from the low-frequency cut filter 28, counts the number of included zero-crossings, and outputs the obtained number of zero-crossings to the fricative likelihood calculating unit 30. As a method of counting the number of zero crossings, a method of comparing the positive and negative of adjacent samples, counting them if they are not the same, counting them as zero, and taking the product of the values of the adjacent samples and calculating the result as negative or zero If there is, there is a method of counting as crossing zero.

摩擦音らしさ算出部３０は、零交差数カウント部２９より入力された零交差数を、所定の閾値と比較し、この比較結果に基づいて摩擦音らしさ４７を求めて、これを加算制御値算出部４５に出力する。例えば、零交差数が閾値より大きい場合には、摩擦音らしいと判定して摩擦音らしさを１に設定する。逆に零交差数が閾値より小さい場合には、摩擦音らしくないと判定して摩擦音らしさを０に設定する。この他、閾値を２つ以上設けて、摩擦音らしさを段階的に設定したり、所定の関数を用意しておいて、零交差数から連続的な値の摩擦音らしさを算出するようにしても良い。 The fricative sound likeness calculating section 30 compares the number of zero crossings input from the zero crossing number counting section 29 with a predetermined threshold value, obtains a fricative likelihood 47 based on the comparison result, and adds this to the addition control value calculating section 45. Output to For example, when the number of zero crossings is larger than the threshold value, it is determined that the sound is a fricative sound, and the likelihood of the fricative sound is set to 1. Conversely, if the number of zero crossings is smaller than the threshold value, it is determined that the sound is not like a fricative sound, and the likelihood of fricative sound is set to 0. In addition, two or more threshold values may be provided to set the likelihood of frictional noise in a stepwise manner, or a predetermined function may be prepared to calculate the likelihood of continuous frictional noise from the number of zero crossings. .

なお、この摩擦音らしさ評価部２７内の構成は、あくまでも一例にすぎず、スペクトル傾斜の分析結果に基づいて評価するようにしたり、パワーやスペクトルの定常性に基づいて評価するようにしたり、零交差数も含めて複数のパラメータを組み合わせて評価するようにしたりしても構わない。 Note that the configuration in the fricative soundness evaluation section 27 is merely an example, and the evaluation is performed based on the analysis result of the spectrum tilt, the evaluation is performed based on the power and the stationarity of the spectrum, or the zero-crossing is performed. The evaluation may be performed by combining a plurality of parameters including the number.

加算制御値算出部４５は、背景雑音らしさ評価部３１より入力された背景雑音らしさ４６と、摩擦音らしさ評価部２７より入力された摩擦音らしさ４７に基づいて、加算制御値３５を算出し、これを重み付き加算部１８に出力する。背景雑音らしい場合と摩擦音らしい場合のどちらにおいても、量子化雑音が聞き苦しくなってしまうことが多いので、背景雑音らしさ４６と摩擦音らしさ４７を適切に重み付き加算することで加算制御値３５を算出すればよい。 The addition control value calculation unit 45 calculates an addition control value 35 based on the background noise likelihood 46 input from the background noise likeness evaluation unit 31 and the fricative sound likeness 47 input from the fricative sound likeness evaluation unit 27. Output to the weighted addition unit 18. Since the quantization noise often becomes hard to hear in both the case of the background noise and the case of the fricative sound, the addition control value 35 is calculated by appropriately weighting and adding the background noise likeness 46 and the fricative sound likeness 47. do it.

以降の信号変形部７、変形強度制御部２０、重み付き加算部１８の動作は、実施の形態３と同様であり、説明を省略する。 Subsequent operations of the signal transformation unit 7, the transformation intensity control unit 20, and the weighted addition unit 18 are the same as those in the third embodiment, and a description thereof will be omitted.

この実施の形態６によれば、入力信号（復号音声）の背景雑音らしさと摩擦音らしさが高い場合に、入力信号（復号音声）の代わりに加工信号（変形復号音声）をより大きく出力するようにしたので、実施の形態３が持つ効果に加えて、量子化雑音や劣化成分が多く発生しがちな摩擦音区間に対して重点的な加工が加えられ、摩擦音以外の区間についてもその区間に適切な加工（加工しない、低レベルの加工を行うなど）が選択されるので、主観品質を改善できる効果がある。なお、摩擦音らしさ以外にも、量子化雑音や劣化成分が多く発生しがちな部分がある程度特定できる場合には、その部分らしさを評価して、加算制御値に反映させることが可能である。その様に構成すれば、大きい量子化雑音や劣化成分を１つずつ抑圧していくことができるので、主観品質が一層改善できる効果がある。 According to the sixth embodiment, when the likelihood of background noise and fricative noise of an input signal (decoded voice) is high, a processed signal (modified decoded voice) is output more instead of the input signal (decoded voice). Therefore, in addition to the effects of the third embodiment, emphasis processing is applied to a friction sound section in which quantization noise and a large amount of degradation components tend to occur, and a section other than the friction sound is appropriately processed in that section. Since processing (no processing, low-level processing, etc.) is selected, there is an effect that the subjective quality can be improved. In addition, when a portion where quantization noise and a large amount of degradation components tend to occur can be specified to some extent other than the frictional sound, it is possible to evaluate the likelihood and reflect it in the addition control value. With such a configuration, since large quantization noise and degraded components can be suppressed one by one, there is an effect that the subjective quality can be further improved.

また、当然のことであるが、背景雑音らしさ評価部を削除した構成も可能である。 Of course, a configuration in which the background noise likeness evaluation unit is deleted is also possible.

実施の形態７．
図１との対応部分に同一符号を付けた図１０は、本実施の形態による信号加工方法を適用した音声復号装置の全体構成を示し、図中３２はポストフィルタ部である。 Embodiment 7 FIG.
FIG. 10 in which the same reference numerals are assigned to parts corresponding to those in FIG. 1 shows the entire configuration of a speech decoding device to which the signal processing method according to the present embodiment is applied, and 32 in the figure denotes a post-filter unit.

まず音声符号３が音声復号装置１内の音声復号部４に入力される。 First, the speech code 3 is input to the speech decoding unit 4 in the speech decoding device 1.

音声復号部４は、入力された音声符号３に対して復号処理を行い、得られた復号音声５をポストフィルタ部３２、信号変形部７、信号評価部１２に出力する。 The audio decoding unit 4 performs a decoding process on the input audio code 3 and outputs the obtained decoded audio 5 to the post-filter unit 32, the signal transformation unit 7, and the signal evaluation unit 12.

ポストフィルタ部３２は、入力された復号音声５に対して、スペクトル強調処理、ピッチ周期性強調処理などを行い、得られた結果をポストフィルタ復号音声４８として重み付き加算部１８に出力する。このポストフィルタ処理は、ＣＥＬＰ復号処理の後処理として一般的に使用されているもので、符号化復号化によって発生した量子化雑音を抑圧することを目的として導入されている。スペクトル強度の弱い部分には量子化雑音が多く含まれているので、この成分の振幅を抑圧してしまうものである。なお、ピッチ周期性強調処理が行われず、スペクトル強調処理だけが行われている場合もある。 The post-filter unit 32 performs a spectrum enhancement process, a pitch periodicity enhancement process, and the like on the input decoded speech 5, and outputs the obtained result to the weighted addition unit 18 as a post-filter decoded speech 48. This post-filter process is generally used as a post-process of the CELP decoding process, and is introduced for the purpose of suppressing quantization noise generated by encoding and decoding. Since the quantization noise is much contained in the portion where the spectrum intensity is weak, the amplitude of this component is suppressed. In some cases, the pitch periodicity enhancement process is not performed, and only the spectrum enhancement process is performed.

なお、実施の形態１、実施の形態３ないし６は、このポストフィルタ処理を音声復号部４内に含まれるもの、もしくは存在しないものの何れにも適用可能なものについて説明したが、この実施の形態７では、音声復号部４内にポストフィルタ処理が含まれるものからポストフィルタ処理の全部もしくは一部をポストフィルタ部３２として独立させている。 In the first embodiment and the third to sixth embodiments, a description has been given of a case in which this post-filter processing is applicable to both those included in the audio decoding unit 4 and those not present. 7, all or a part of the post-filter processing is made independent as the post-filter unit 32 from the post-filter processing included in the audio decoding unit 4.

信号変形部７は、実施の形態１と同様に、入力された復号音声５に対して、フーリエ変換部８、振幅平滑化部９、位相擾乱部１０、逆フーリエ変換部１１の処理を行い、得られた変形復号音声３４を重み付き加算部１８に出力する。 The signal transformation unit 7 performs the processing of the Fourier transform unit 8, the amplitude smoothing unit 9, the phase disturbance unit 10, and the inverse Fourier transform unit 11 on the input decoded speech 5 as in the first embodiment. The obtained modified decoded speech 34 is output to the weighted addition unit 18.

信号評価部１２は、実施の形態１と同様に、入力された復号音声５に対して、背景雑音らしさを評価し、評価結果を加算制御値３５として重み付き加算部１８に出力する。 As in the first embodiment, the signal evaluation unit 12 evaluates the likelihood of background noise with respect to the input decoded speech 5 and outputs the evaluation result to the weighted addition unit 18 as an addition control value 35.

そして、最後の処理として、重み付き加算部１８は、実施の形態１と同様に、信号評価部１２から入力された加算制御値３５に基づいて、ポストフィルタ部３２から入力されたポストフィルタ復号音声４８と信号変形部７から入力された変形復号音声３４を重み付け加算し、得られた出力音声６を出力する。 Then, as the last process, the weighted addition unit 18 performs the post-filter decoding voice input from the post-filter unit 32 based on the addition control value 35 input from the signal evaluation unit 12 as in the first embodiment. 48 and the modified decoded speech 34 input from the signal transformation unit 7 are weighted and added, and the obtained output speech 6 is output.

この実施の形態７によれば、ポストフィルタによる加工前の復号音声に基づいて変形復号音声を生成し、更にポストフィルタによる加工前の復号音声を分析して背景雑音らしさを求め、これに基づいてポストフィルタ復号音声と変形復号音声の加算時の重みを制御するようにしたので、実施の形態１が持つ効果に加えて、ポストフィルタによる復号音声の変形を含まない変形復号音声が生成でき、ポストフィルタによる復号音声の変形に影響されずに算出した精度の高い背景雑音らしさに基づいて精度の高い加算重み制御ができるようになるので、更に主観品質が改善する効果がある。 According to the seventh embodiment, a modified decoded speech is generated based on the decoded speech before processing by the post filter, and the decoded speech before processing by the post filter is analyzed to determine the likelihood of background noise. Since the weight at the time of adding the post-filter decoded speech and the modified decoded speech is controlled, in addition to the effect of the first embodiment, a modified decoded speech that does not include the modification of the decoded speech by the post filter can be generated. Since highly accurate addition weight control can be performed based on the high-accuracy background noise calculated without being affected by the deformation of the decoded speech by the filter, the subjective quality is further improved.

背景雑音区間においては、ポストフィルタによって劣化音までも強調されて聞き苦しくなってしまっていることが多く、ポストフィルタによる加工前の復号音声を出発点として変形復号音声を生成した方が、歪み音は小さくなる。また、ポストフィルタの処理が複数のモードを持っており、しばしば処理を切り替える場合には、その切り替えが背景雑音らしさの評価に影響する危険性が高く、ポストフィルタによる加工前の復号音声に対して背景雑音らしさを評価した方が安定な評価結果が得られる。 In the background noise section, even the degraded sound is often emphasized by the post filter, making it difficult to hear. Becomes smaller. Also, post-filter processing has multiple modes, and if the processing is frequently switched, there is a high risk that the switching will affect the evaluation of the likelihood of background noise. A more stable evaluation result can be obtained by evaluating the likelihood of the background noise.

なお、実施の形態３の構成において、この実施の形態７と同様にポストフィルタ部の分離を行った場合には、図５の聴覚重み付け部２１の出力結果が、より符号化処理内の聴覚重み付け音声に近づき、量子化雑音の多い成分の特定精度が上がり、より良い変形強度制御が得られ、主観品質が更に改善する効果が得られる。 In the configuration of the third embodiment, when the post-filter unit is separated in the same manner as in the seventh embodiment, the output result of the auditory weighting unit 21 in FIG. As the sound approaches the voice, the accuracy of specifying a component having a large amount of quantization noise increases, better deformation intensity control can be obtained, and the effect of further improving the subjective quality can be obtained.

また、実施の形態６の構成において、この実施の形態７と同様にポストフィルタ部の分離を行った場合には、図９の摩擦音らしさ評価部２７における評価精度が上がり、主観品質が更に改善する効果が得られる。 Further, in the configuration of the sixth embodiment, when the post-filter unit is separated in the same manner as in the seventh embodiment, the evaluation accuracy in the frictional sound likeness evaluation unit 27 in FIG. 9 is increased, and the subjective quality is further improved. The effect is obtained.

なお、ポストフィルタ部の分離を行わない構成は、分離したこの実施の形態７の構成に比べると、音声復号部（ポストフィルタを含む）との接続が復号音声の１点だけと少なく、独立の装置、プログラムにて実現が容易である長所がある。この実施の形態７では、ポストフィルタを有する音声復号部に対して独立の装置、プログラムにて実現することが容易でない短所もあるが、上記の様々な効果を持つものである。 The configuration in which the post filter section is not separated has a smaller connection with the audio decoding section (including the post filter) to only one point of the decoded voice than the separated configuration of the seventh embodiment. There is an advantage that it can be easily realized by a device and a program. In the seventh embodiment, there is a disadvantage that it is not easy to realize an audio decoding unit having a post filter by an independent device and a program, but it has the various effects described above.

実施の形態８．
図１０との対応部分に同一符号を付けた図１１は、本実施の形態による音信号加工方法を適用した音声復号装置の全体構成を示し、図中３３は音声復号部４内で生成されたスペクトルパラメータである。図１０との相違点としては、実施の形態３と同様の変形強度制御部２０が追加され、スペクトルパラメータ３３が音声復号部４から信号評価部１２と変形強度制御部２０に入力されている点である。 Embodiment 8 FIG.
FIG. 11, in which parts corresponding to those in FIG. 10 are assigned the same reference numerals, shows the entire configuration of a speech decoding apparatus to which the sound signal processing method according to the present embodiment is applied. It is a spectrum parameter. The difference from FIG. 10 is that a deformation intensity control unit 20 similar to that of the third embodiment is added, and a spectrum parameter 33 is input from the speech decoding unit 4 to the signal evaluation unit 12 and the deformation intensity control unit 20. It is.

音声復号部４は、入力された音声符号３に対して復号処理を行い、得られた復号音声５をポストフィルタ部３２、信号変形部７、変形強度制御部２０、信号評価部１２に出力する。また、復号処理の過程で生成したスペクトルパラメータ３３を、信号評価部１２内の推定雑音スペクトル更新部１７と変形強度制御部２０内の聴覚重み付け部２１に出力する。なお、スペクトルパラメータ３３としては、線形予測係数（ＬＰＣ）、線スペクトル対（ＬＳＰ）などが一般的に用いられていることが多い。 The audio decoding unit 4 performs a decoding process on the input audio code 3 and outputs the obtained decoded audio 5 to the post-filter unit 32, the signal modification unit 7, the modification intensity control unit 20, and the signal evaluation unit 12. . In addition, the spectrum parameters 33 generated in the decoding process are output to the estimated noise spectrum updating unit 17 in the signal evaluation unit 12 and the auditory weighting unit 21 in the deformation intensity control unit 20. Note that, as the spectrum parameter 33, a linear prediction coefficient (LPC), a line spectrum pair (LSP), and the like are often used in general.

変形強度制御部２０内の聴覚重み付け部２１は、音声復号部４より入力された復号音声５に対して、やはり音声復号部４から入力されたスペクトルパラメータ３３を用いて聴覚重み付け処理を行い、得られた聴覚重み付け音声をフーリエ変換部２２に出力する。具体的な処理としては、スペクトルパラメータ３３が線形予測係数（ＬＰＣ）である場合にはこれをそのまま用い、スペクトルパラメータ３３がＬＰＣ以外のパラメータである場合には、このスペクトルパラメータ３３をＬＰＣに変換して、このＬＰＣに定数乗算を行って２つの変形ＬＰＣを求め、この２つの変形ＬＰＣをフィルタ係数とするＡＲＭＡフィルタを構成し、このフィルタを用いたフィルタリング処理によって聴覚重み付けを行う。なお、この聴覚重み付け処理は、音声符号化処理（音声復号部４で行った音声復号処理と対を成すもの）で使用されているものと同様な処理を行うことが望ましい。 The auditory weighting unit 21 in the deformation intensity control unit 20 performs an auditory weighting process on the decoded speech 5 input from the audio decoding unit 4 using the spectrum parameter 33 also input from the audio decoding unit 4 and obtains The obtained auditory weighted sound is output to the Fourier transform unit 22. As a specific process, when the spectrum parameter 33 is a linear prediction coefficient (LPC), this is used as it is, and when the spectrum parameter 33 is a parameter other than LPC, the spectrum parameter 33 is converted into LPC. Then, the LPC is multiplied by a constant to obtain two modified LPCs, an ARMA filter using the two modified LPCs as filter coefficients is formed, and auditory weighting is performed by a filtering process using this filter. In addition, it is desirable that the auditory weighting process performs the same process as that used in the audio encoding process (which forms a pair with the audio decoding process performed by the audio decoding unit 4).

変形強度制御部２０内では、上記聴覚重み付け部２１の処理に続いて、実施の形態３と同様に、フーリエ変換部２２、レベル判定部２３、連続性判定部２４、変形強度算出部２５の処理を行い、得られた変形強度を信号変形部７に対して出力する。 In the deformation intensity control unit 20, following the processing of the auditory weighting unit 21, the processing of the Fourier transform unit 22, the level determination unit 23, the continuity determination unit 24, and the deformation intensity calculation unit 25, as in the third embodiment. And outputs the obtained deformation intensity to the signal deformation unit 7.

信号変形部７は、実施の形態３と同様に、入力された復号音声５と変形強度に対して、フーリエ変換部８、振幅平滑化部９、位相擾乱部１０、逆フーリエ変換部１１の処理を行い、得られた変形復号音声３４を重み付き加算部１８に出力する。 Similarly to the third embodiment, the signal transformation unit 7 processes the input decoded speech 5 and the transformation strength by the Fourier transform unit 8, the amplitude smoothing unit 9, the phase disturbance unit 10, and the inverse Fourier transform unit 11. And outputs the obtained modified decoded speech 34 to the weighted addition unit 18.

信号評価部１２内では、実施の形態１と同様に、入力された復号音声５に対して、まず逆フィルタ部１３、パワー算出部１４、背景雑音らしさ算出部１５の処理を行って背景雑音らしさを評価し、評価結果を加算制御値３５として重み付き加算部１８に出力する。また、推定雑音パワー更新部１６の処理を行って、内部の推定雑音パワーを更新する。 In the signal evaluation unit 12, as in the first embodiment, the input decoded speech 5 is first subjected to the processing of the inverse filter unit 13, the power calculation unit 14, and the background noise likeness calculation unit 15, and the background noise likeness is thus obtained. And outputs the evaluation result to the weighted addition unit 18 as an addition control value 35. Further, the processing of the estimated noise power updating unit 16 is performed to update the internal estimated noise power.

そして、推定雑音スペクトル更新部１７は、音声復号部４から入力されたスペクトルパラメータ３３と背景雑音らしさ算出部１５から入力され背景雑音を用いて、その内部に格納してある推定雑音スペクトルを更新する。例えば、入力された背景雑音らしさが高い時に、実施の形態１に示した式に従い、スペクトルパラメータ３３を推定雑音スペクトルに反映させることで更新を行う。
以降のポストフィルタ部３２、重み付き加算部１８の動作については、実施の形態７と同様であるため、説明を省略する。 The estimated noise spectrum updating unit 17 updates the estimated noise spectrum stored therein using the spectrum parameters 33 input from the speech decoding unit 4 and the background noise input from the background noise likeness calculating unit 15. . For example, when the likelihood of the input background noise is high, the update is performed by reflecting the spectrum parameter 33 in the estimated noise spectrum according to the equation shown in the first embodiment.
Subsequent operations of the post-filter unit 32 and the weighted addition unit 18 are the same as those in the seventh embodiment, and thus description thereof is omitted.

この実施の形態８によれば、音声復号処理の過程で生成されたスペクトルパラメータを流用して、聴覚重み付け処理、推定雑音スペクトルの更新を行うようにしたので、実施の形態３及び実施の形態７が持つ効果に加えて、処理が簡易化される効果がある。 According to the eighth embodiment, the auditory weighting process and the update of the estimated noise spectrum are performed by using the spectrum parameters generated in the speech decoding process. Therefore, the third and seventh embodiments are used. There is an effect that processing is simplified, in addition to the effect of.

更に、符号化処理とまったく同じ聴覚重み付け処理が実現され、量子化雑音の多い成分の特定精度が上がり、より良い変形強度制御が得られ、主観品質が改善する効果が得られる。 Furthermore, the same auditory weighting process as the encoding process is realized, the accuracy of specifying a component with much quantization noise is increased, better deformation intensity control is obtained, and the effect of improving the subjective quality is obtained.

また、背景雑音らしさの算出に用いる推定雑音スペクトルの（音声符号化処理に入力された音声のスペクトルに近いという意味での）推定精度が上がり、結果として得られる安定した高精度の背景雑音らしさに基づいて精度の高い加算重み制御ができるようになり、主観品質が改善する効果がある。 In addition, the estimation accuracy of the estimated noise spectrum used for calculating the likelihood of background noise (in the sense that it is close to the spectrum of the speech input to the speech encoding process) is improved, and the resulting stable and highly accurate background noise is considered. Based on this, it is possible to perform highly accurate addition weight control, and there is an effect that the subjective quality is improved.

なお、この実施の形態８では、ポストフィルタ部３２を音声復号部４から分離した構成であったが、分離していない構成においても、実施の形態８のように音声復号部４が出力したスペクトルパラメータ３３を流用して信号加工部２の処理を行うことができる。この場合でも、上記実施の形態８と同様の効果が得られる。 Although the post-filter unit 32 is separated from the audio decoding unit 4 in the eighth embodiment, the spectrum output from the audio decoding unit 4 as in the eighth embodiment may be used even in a non-separated configuration. The processing of the signal processing unit 2 can be performed using the parameter 33. In this case, the same effect as in the eighth embodiment can be obtained.

実施の形態９．
上記図７に示す実施の形態４の構成において、加算制御値分割部４１が、重み付け加算部１８にて加算される変形復号音声スペクトル４４の周波数毎の重みを乗じた後のスペクトルの概形が、量子化雑音の推定スペクトル形状に一致するように、出力する変形強度を制御することも可能である。 Embodiment 9 FIG.
In the configuration of the fourth embodiment shown in FIG. 7, the outline of the spectrum after the addition control value division unit 41 multiplies the weight for each frequency of the modified decoded speech spectrum 44 added by the weighting addition unit 18 is as follows. It is also possible to control the output deformation intensity so as to match the estimated spectral shape of the quantization noise.

図１２は、この場合の復号音声スペクトル４３と、変形復号音声スペクトル４４に周波数毎の重みを乗じた後のスペクトルの一例を示す模式図である。 FIG. 12 is a schematic diagram showing an example of the spectrum after multiplying the decoded speech spectrum 43 and the modified decoded speech spectrum 44 by the weight for each frequency in this case.

復号音声スペクトル４３には、符号化方式に依存したスペクトル形状を持つ量子化雑音が重畳している。ＣＥＬＰ系の音声符号化方式においては、聴覚重み付け処理後の音声における歪みを最小化するように符号の探索を行う。このため、量子化雑音は、聴覚重み付け処理後の音声においては、平坦なスペクトル形状を持つことになり、最終的な量子化雑音のスペクトル形状は、聴覚重み付け処理の逆特性のスペクトル形状を持つことになる。よって、聴覚重み付け処理のスペクトル特性を求め、この逆特性のスペクトル形状を求めて、変形復号音声スペクトルのスペクトル形状がこれに合うように、加算制御値分割部４１の出力を制御することは可能である。 On the decoded speech spectrum 43, quantization noise having a spectrum shape depending on the encoding method is superimposed. In the CELP speech coding method, a code search is performed so as to minimize distortion in the speech after the auditory weighting process. For this reason, the quantization noise has a flat spectrum shape in the speech after the hearing weighting process, and the final quantization noise spectrum shape has a spectrum shape of the inverse characteristic of the hearing weighting process. become. Therefore, it is possible to obtain the spectrum characteristic of the auditory weighting process, obtain the spectrum shape of the inverse characteristic, and control the output of the addition control value dividing unit 41 so that the spectrum shape of the modified decoded speech spectrum matches this. is there.

この実施の形態９によれば、最終的な出力音声６に含まれる変形復号音声成分のスペクトル形状を量子化雑音の推定スペクトルの概形に一致するようにしたので、実施の形態４が持つ効果に加えて、必要最低限のパワーの変形復号音声の加算によって音声区間における聞き苦しい量子化雑音を聞こえにくくすることができる効果がある。 According to the ninth embodiment, the spectrum shape of the modified decoded speech component included in the final output speech 6 is made to match the approximate shape of the estimated spectrum of the quantization noise. In addition to the above, there is an effect that it is possible to make the hard-to-hear quantization noise difficult to hear in the voice section by adding the deformed decoded voice with the minimum necessary power.

実施の形態１０．
上記実施の形態１、実施の形態３ないし８の構成において、振幅平滑化部９の処理内で、平滑化後の振幅スペクトルが推定量子化雑音の振幅スペクトル形状に一致するように加工することも可能である。なお、推定量子化雑音の振幅スペクトル形状の算出は、実施の形態９と同様にして行えばよい。 Embodiment 10 FIG.
In the configuration of the first embodiment and the third to eighth embodiments, in the processing of the amplitude smoothing unit 9, processing may be performed such that the amplitude spectrum after smoothing matches the amplitude spectrum shape of the estimated quantization noise. It is possible. The calculation of the amplitude spectrum shape of the estimated quantization noise may be performed in the same manner as in the ninth embodiment.

この実施の形態１０によれば、変形復号音声のスペクトル形状を量子化雑音の推定スペクトル形状に一致するようにしたので、実施の形態１、実施の形態３ないし８が持つ効果に加えて、必要最低限のパワーの変形復号音声の加算によって音声区間における聞き苦しい量子化雑音を聞こえにくくすることができる効果がある。 According to the tenth embodiment, the spectrum shape of the modified decoded speech is made to match the estimated spectrum shape of the quantization noise. Therefore, in addition to the effects of the first and third to eighth embodiments, necessary By adding the modified decoded voice with the minimum power, there is an effect that uncomfortable quantization noise in a voice section can be hardly heard.

実施の形態１１．
上記実施の形態１、実施の形態３ないし１０では、信号加工部２を復号音声５の加工に使用しているが、この信号加工部２のみを取り出して、音響信号復号部（音響信号符号化に対する復号部）、雑音抑圧処理の後段に接続するなど、他の信号加工処理に使用することもできる。但し、解消したい劣化成分の特性に応じて、信号変形部における変形処理、信号評価部における評価方法を変更、調整することが必要になる。 Embodiment 11 FIG.
In the first embodiment and the third to tenth embodiments, the signal processing unit 2 is used for processing the decoded voice 5. However, only the signal processing unit 2 is extracted and the audio signal decoding unit (the audio signal encoding unit) is used. , And can be used for other signal processing, such as connecting to the subsequent stage of the noise suppression processing. However, it is necessary to change and adjust the deformation process in the signal deformation unit and the evaluation method in the signal evaluation unit according to the characteristics of the degradation component to be eliminated.

この実施の形態１１によれば、復号音声以外の劣化成分を含む信号に対して、主観的に好ましくない成分を感じにくく加工することが可能である。 According to the eleventh embodiment, it is possible to process a signal including a degraded component other than the decoded voice so that a subjectively undesirable component is less likely to be perceived.

実施の形態１２．
上記実施の形態１ないし１１では、現在のフレームまでの信号を用いて該信号の加工を行っているが、処理遅延の発生を許して次フレーム以降の信号も使用する構成も可能である。 Embodiment 12 FIG.
In the first to eleventh embodiments, the signal is processed by using the signal up to the current frame. However, a configuration in which the processing delay is allowed to use the signal of the next frame and thereafter is also possible.

この実施の形態１２によれば、次のフレーム以降の信号を参照できるので、振幅スペクトルの平滑化特性の改善、連続性判定の精度向上、雑音らしさなどの評価精度の向上効果が得られる。 According to the twelfth embodiment, it is possible to refer to the signal after the next frame, so that it is possible to obtain the effect of improving the smoothing characteristics of the amplitude spectrum, improving the accuracy of continuity determination, and improving the evaluation accuracy such as noise.

実施の形態１３．
上記実施の形態１、実施の形態３、実施の形態５ないし１２では、フーリエ変換によってスペクトル成分を算出し、変形処理を行って、逆フーリエ変換によって信号領域に戻しているが、フーリエ変換の代わりにバンドパスフィルタ群の各出力に対して、変形処理を行い、帯域別信号の加算によって信号を再構築する構成も可能である。 Embodiment 13 FIG.
In the first, third, and fifth to twelfth embodiments, the spectral components are calculated by the Fourier transform, transformed, and returned to the signal domain by the inverse Fourier transform. Alternatively, it is also possible to perform a transformation process on each output of the band-pass filter group and reconstruct the signal by adding the signals for each band.

この実施の形態１３によれば、フーリエ変換を使用しない構成でも同様の効果が得られる。 According to the thirteenth embodiment, the same effect can be obtained even in a configuration not using Fourier transform.

実施の形態１４．
上記実施の形態１ないし１３では、振幅平滑化部９と位相擾乱部１０の両方を備えた構成であったが、振幅平滑化部９と位相擾乱部１０の一方を省略した構成も可能であるし、更に別の変形部を導入した構成も可能である。 Embodiment 14 FIG.
In the first to thirteenth embodiments, the configuration includes both the amplitude smoothing unit 9 and the phase disturbance unit 10. However, a configuration in which one of the amplitude smoothing unit 9 and the phase disturbance unit 10 is omitted is also possible. However, a configuration in which another deformed portion is introduced is also possible.

この実施の形態１４によれば、解消したい量子化雑音や劣化音の特性によっては、導入効果がない変形部を省略することで処理が簡易化できる効果がある。また、適切な変形部を導入することで、振幅平滑化部９と位相擾乱部１０では解消できない量子化雑音や劣化音を解消できる効果が期待できる。 According to the fourteenth embodiment, depending on the characteristics of the quantization noise and the degraded sound to be eliminated, there is an effect that the processing can be simplified by omitting a deformed portion having no introduction effect. In addition, by introducing an appropriate deformation unit, an effect of eliminating quantization noise and degraded sound that cannot be eliminated by the amplitude smoothing unit 9 and the phase disturbance unit 10 can be expected.

この発明の実施の形態１による音声復号方法を適用した音声復号装置の全体構成を示す図である。FIG. 1 is a diagram illustrating an overall configuration of a speech decoding device to which a speech decoding method according to Embodiment 1 of the present invention has been applied. この発明の実施の形態１の重み付け加算部１８における加算制御値に基づく重み付け加算の制御例を示す図である。FIG. 5 is a diagram illustrating a control example of weighted addition based on an addition control value in the weighted addition unit 18 according to the first embodiment of the present invention. この発明の実施の形態１のフーリエ変換部８における切り出し窓、逆フーリエ変換部１１における連接のための窓の実際の形状例、復号音声５との時間関係を説明する説明図である。FIG. 4 is an explanatory diagram illustrating an example of an actual shape of a cutout window in the Fourier transform unit 8, a window for connection in the inverse Fourier transform unit 11, and a time relationship with the decoded speech 5 according to the first embodiment of the present invention. この発明の実施の形態２の音信号加工方法を雑音抑圧方法と組み合わて適用した音声復号装置の構成の一部を示す図である。FIG. 14 is a diagram illustrating a part of a configuration of a speech decoding device to which the sound signal processing method according to the second embodiment of the present invention is applied in combination with a noise suppression method. この発明の実施の形態３による音声復号方法を適用した音声復号装置の全体構成を示す図である。FIG. 13 is a diagram illustrating an overall configuration of a speech decoding device to which a speech decoding method according to Embodiment 3 of the present invention is applied. この発明の実施の形態３の聴覚重み付けスペクトルと第一の変形強度の関係を示す図である。FIG. 13 is a diagram illustrating a relationship between an auditory weighting spectrum and a first deformation intensity according to the third embodiment of the present invention. この発明の実施の形態４による音声復号方法を適用した音声復号装置の全体構成を示す図である。FIG. 14 is a diagram illustrating an overall configuration of a speech decoding device to which a speech decoding method according to Embodiment 4 of the present invention is applied. この発明の実施の形態５による音声復号方法を適用した音声復号装置の全体構成を示す図である。FIG. 15 is a diagram illustrating an overall configuration of a speech decoding device to which a speech decoding method according to a fifth embodiment of the present invention is applied. この発明の実施の形態６による音声復号方法を適用した音声復号装置の全体構成を示す図である。FIG. 15 is a diagram illustrating an overall configuration of a speech decoding device to which a speech decoding method according to Embodiment 6 of the present invention is applied. この発明の実施の形態７による音声復号方法を適用した音声復号装置の全体構成を示す図である。FIG. 15 is a diagram illustrating an overall configuration of a speech decoding device to which a speech decoding method according to a seventh embodiment of the present invention is applied. この発明の実施の形態８による音声復号方法を適用した音声復号装置の全体構成を示す図である。FIG. 21 is a diagram illustrating an overall configuration of a speech decoding device to which a speech decoding method according to an eighth embodiment of the present invention is applied. この発明の実施の形態９を適用した復号音声スペクトル４３と、変形復号音声スペクトル４４に周波数毎の重みを乗じた後のスペクトルの一例を示す模式図である。FIG. 21 is a schematic diagram showing an example of a spectrum obtained by multiplying a decoded speech spectrum 43 to which Embodiment 9 of the present invention is applied and a modified decoded speech spectrum 44 by a weight for each frequency.

Claims

Decoding a speech code to generate a decoded speech, a decoded speech generation step of generating predetermined information based on the speech code,
A first processed voice generation step of processing the decoded voice to generate a first processed voice,
An evaluation value calculation step of calculating a predetermined evaluation value based on the information,
A second processed voice generating step of generating a second processed voice by weighting and adding the decoded voice and the first processed voice based on the evaluation value,
An output audio step of outputting the second processed audio as an output audio.