WO2006046547A1 - 音声符号化装置および音声符号化方法 - Google Patents
音声符号化装置および音声符号化方法 Download PDFInfo
- Publication number
- WO2006046547A1 WO2006046547A1 PCT/JP2005/019579 JP2005019579W WO2006046547A1 WO 2006046547 A1 WO2006046547 A1 WO 2006046547A1 JP 2005019579 W JP2005019579 W JP 2005019579W WO 2006046547 A1 WO2006046547 A1 WO 2006046547A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- spectrum
- layer
- standard deviation
- nonlinear
- unit
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 23
- 238000001228 spectrum Methods 0.000 claims abstract description 223
- 230000006870 function Effects 0.000 claims abstract description 89
- 230000009466 transformation Effects 0.000 claims description 68
- 238000006243 chemical reaction Methods 0.000 claims description 25
- 238000004458 analytical method Methods 0.000 claims description 17
- 238000004891 communication Methods 0.000 claims description 8
- 230000001131 transforming effect Effects 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 3
- 238000013139 quantization Methods 0.000 abstract description 19
- 230000000873 masking effect Effects 0.000 description 20
- 238000004364 calculation method Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 12
- 230000003595 spectral effect Effects 0.000 description 8
- 238000012545 processing Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 230000005236 sound signal Effects 0.000 description 5
- 239000013598 vector Substances 0.000 description 5
- 238000013459 approach Methods 0.000 description 4
- 230000010354 integration Effects 0.000 description 3
- 238000010295 mobile communication Methods 0.000 description 3
- 238000000926 separation method Methods 0.000 description 3
- 230000006837 decompression Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 235000003899 Brassica oleracea var acephala Nutrition 0.000 description 1
- 235000012905 Brassica oleracea var viridis Nutrition 0.000 description 1
- 244000064816 Brassica oleracea var. acephala Species 0.000 description 1
- 241000282412 Homo Species 0.000 description 1
- 206010021403 Illusion Diseases 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/24—Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
Definitions
- the present invention relates to a speech coding apparatus and speech coding method, and more particularly to a speech coding apparatus and speech coding method suitable for scalable coding.
- an approach that hierarchically integrates a plurality of coding techniques is promising.
- One approach is to apply a difference signal between the input signal and the decoded signal in the first layer to a non-speech signal for the first layer that encodes the input signal at a low bit rate using a model suitable for the speech signal.
- Conventional scalable coding includes, for example, performing scalable coding using a technique standardized by MPEG-4 (Moving Picture Experts Group phase-4) (see Non-Patent Document 1). ).
- CELP Code Excited Linear Prediction
- AAC Analog Domain Weighted Interleave Vector Quantization
- frequency domain A transform code such as weighted interleaved vector quantization is used as the second layer.
- Patent Document 1 Japanese Patent No. 3299073
- Non-Patent Document 1 edited by Satoshi Miki, All of MPEG-4, first edition, Industrial Research Co., Ltd., September 30, 1998, p.126-127
- An object of the present invention is to provide a speech coding apparatus and speech coding method that can improve quantization performance while minimizing an increase in bit rate.
- the speech coding apparatus is a speech coding apparatus that performs coding with a hierarchical structure having a plurality of layer forces, and performs frequency analysis on a lower layer decoded signal to perform lower layer decoding.
- Analysis means for calculating a spectrum; selection means for selecting any one of a plurality of nonlinear transformation functions based on a degree of variation in the decoded spectrum of the lower layer; and a residual spectrum subjected to nonlinear transformation
- the inverse transforming means for inverse transforming using the nonlinear transform function selected by the selecting means, and adding the inversely transformed residual vector and the decoded spectrum of the lower layer to obtain the decoded spectrum of the upper layer And obtaining addition means.
- FIG. 1 is a block diagram showing a configuration of a speech encoding apparatus according to Embodiment 1 of the present invention.
- FIG. 2 is a block diagram showing a configuration of a second layer code key section according to Embodiment 1 of the present invention.
- FIG. 3 is a block diagram showing a configuration of an error comparison unit according to the first embodiment of the present invention.
- FIG. 4 is a block diagram showing a configuration of a second layer code key section according to Embodiment 1 of the present invention (an example of modification).
- FIG. 5 is a graph showing the relationship between the standard deviation of the first layer decoded spectrum and the standard deviation of the error spectrum according to Embodiment 1 of the present invention.
- FIG. 6 is a diagram showing a method for estimating a standard deviation of an error spectrum according to Embodiment 1 of the present invention.
- FIG. 7 is a diagram showing an example of a nonlinear conversion function according to Embodiment 1 of the present invention.
- FIG. 8 is a block diagram showing the configuration of the speech decoding apparatus according to Embodiment 1 of the present invention.
- FIG. 9 is a block diagram showing the configuration of the second layer decoding unit according to Embodiment 1 of the present invention.
- FIG. 10 is a block diagram showing a configuration of an error comparison unit according to the second embodiment of the present invention.
- FIG. 11 is a block diagram showing a configuration of a second layer code key section according to Embodiment 3 of the present invention.
- FIG. 12 is a diagram showing a method for estimating a standard deviation of an error spectrum according to Embodiment 3 of the present invention.
- FIG. 13 is a block diagram showing a configuration of a second layer decoding unit according to Embodiment 3 of the present invention. Best Mode for Carrying Out the Invention
- scalable code encoding having a hierarchical structure having a plurality of layer forces is performed.
- the hierarchical structure of the scalable code is: a first layer (lower layer) and a second layer (upper layer) higher than the first layer.
- the second layer encoding is performed in the frequency domain (transform coding).
- the second layer encoding is based on MDCT (Modified Discrete Cosine Transform; (4)
- MDCT Modified Discrete Cosine Transform;
- the second layer code ⁇ the input signal band is divided into a plurality of subbands (frequency bands), and the code is coded for each subband.
- subband division is performed in association with the critical band, and is divided at equal intervals using Bark ⁇ Kale.
- FIG. 1 shows the configuration of a speech coding apparatus according to Embodiment 1 of the present invention.
- first layer encoding unit 10 converts first input decoding signal unit 20 and multiplexing unit 50 into encoding parameters obtained by encoding an input speech signal (original signal). Output to
- the first layer decoding unit 20 also generates the first layer decoded signal from the first layer encoding unit 10 and outputs it to the second layer encoding unit 40. To do.
- the delay unit 30 gives a predetermined length of delay to the input audio signal (original signal) and outputs the delayed signal to the second layer coding unit 40.
- This delay is for adjusting the time delay generated in the first layer encoding unit 10 and the first layer decoding unit 20.
- the second layer code key unit 40 spectrally codes the original signal output from the delay unit 30 using the first layer decoded signal output from the first layer decoding key unit 20,
- the code parameter obtained from the spectrum code is output to the multiplexing unit 50.
- the multiplexing unit 50 multiplexes the code parameter output from the first layer encoding unit 10 and the encoding parameter output from the second layer encoding unit 40 to obtain a bit stream. Output.
- Second layer encoding unit 40 Second layer encoding unit
- Figure 2 shows the configuration of 40.
- an MDCT analysis unit 401 analyzes the frequency of the first layer decoded signal output from the first layer decoding unit 20 by MDCT conversion, and generates MDCT coefficients (first layer decoding spectrum). ) And outputs the first layer decoded spectrum to the scale factor code unit 404 and the multiplier 405.
- MDCT analysis section 402 performs frequency analysis on the original signal output from delay section 30 by MDCT conversion to calculate MDCT coefficients (original spectrum), and converts the original spectrum to scale factor code input section 404 and error. The result is output to the comparison unit 406.
- the auditory masking calculation unit 403 uses the original signal output from the delay unit 30 in advance. Therefore, the auditory masking for each subband having the specified bandwidth is calculated, and the auditory masking is notified to the error comparing unit 406.
- Human auditory characteristics include an auditory masking characteristic in which when a signal is heard, it is difficult to hear even if a sound with a frequency close to that signal enters the ear.
- the above-mentioned auditory masking uses this human auditory masking characteristic to reduce the number of quantization bits in the frequency spectrum where it is difficult to hear quantization distortion, and the number of quantization bits in the frequency spectrum where quantization distortion is easy to hear. It is used to realize an efficient spectral code by allocating a large amount of.
- the scale factor code unit 404 encodes a scale factor (information representing the spectral outline). The average amplitude for each subband is used as information representing the spectral outline.
- Scale factor coding unit 404 calculates the scale factor of each subband in the first layer decoded signal based on the first layer decoded spectrum output from MDCT analysis unit 401. At the same time, the scale factor code unit 404 calculates the scale factor of each subband of the original signal based on the original spectrum output from the MDCT analysis unit 402.
- the scale factor encoding unit 404 calculates the ratio of the scale factor of the first layer decoded signal to the scale factor of the original signal, and encodes the encoding parameter obtained by encoding the scale factor ratio. Output to unit 407 and multiplexing unit 50.
- the scale factor decoding unit 407 decodes the scale factor ratio based on the encoding parameters output from the scale factor code unit 404, and multiplies the decoded ratio (decoding scale factor ratio) by a multiplier. Output to 405.
- Multiplier 405 multiplies the first layer decoded spectrum output from MDCT analysis section 401 by the decoding scale factor ratio output from scale factor decoding section 407 for each corresponding subband, and standardizes the multiplication result. Output to deviation calculator 408 and adder 413. As a result, the scale factor of the first layer decoded spectrum approaches that of the original spectrum.
- Standard deviation calculation section 408 calculates standard deviation ⁇ c of the first layer decoding spectrum after decoding scale factor ratio multiplication, and outputs the standard deviation ⁇ c to selection section 409.
- the spectrum is separated into amplitude values and positive and negative Z information, and the standard deviation is calculated for the amplitude values. Try to calculate.
- the variation in the first layer decoded spectrum is quantified.
- the selection unit 409 selects, based on the standard deviation ⁇ c output from the standard deviation calculation unit 408, a force to use which non-linear transformation function as a function for performing non-linear inverse transformation of the residual spectrum by the inverse transformation unit 411, Information indicating the selection result is output to the nonlinear transformation function unit 410.
- a plurality of nonlinear transformation function units 410 are prepared based on the selection result of the selection unit 409, and one of the non-linear transformation functions # 1 to #N is output to the inverse transformation unit 411. Do
- the residual spectrum codebook 412 stores a plurality of residual spectrum candidates obtained by compressing the residual spectrum by nonlinear transformation.
- the residual spectrum candidates stored in the residual spectrum codebook 412 may be scalars or vectors.
- the residual spectrum codebook 4 12 is designed using data for intensive learning.
- Inverse transform section 411 performs inverse transform on any one of the residual spectrum candidates stored in residual spectrum codebook 412 using the nonlinear transform function output from nonlinear transform function section 410. (Expansion processing) is performed and output to the adder 413. This is because the second layer encoding unit 40 is configured to minimize the error of the expanded signal.
- Adder 413 adds the residual spectrum candidate after inverse transformation (after decompression) to the first layer decoded spectrum after multiplication of the decoding scale factor ratio, and outputs the result to error comparison section 406.
- the spectrum obtained as a result of this addition corresponds to the candidate for the second layer decoded spectrum.
- second layer encoding section 40 has the same configuration as the second layer decoding section provided in the speech decoding apparatus described later, and is generated by the second layer decoding section. Probably a second layer decoded spectrum candidate.
- the error comparison unit 406 uses the auditory masking notified from the auditory masking calculation unit 403 for some or all of the residual spectrum candidates in the residual spectrum codebook 412, and uses the original masking. The spectrum is compared with the second layer decoded spectrum candidate, and the most suitable residual spectrum candidate is searched from the residual spectrum codebook 412. Then, error comparison section 406 outputs the sign key parameter representing the searched residual spectrum to multiplexing section 50.
- error comparison section 406 The configuration of error comparison section 406 is shown in FIG. In Figure 3, the subtractor 4061 is the original spectrum. Power Generates an error spectrum by subtracting the second layer decoded spectrum candidates and outputs it to the masking versus error ratio calculation unit 4062.
- the masking to error ratio calculation unit 4062 calculates the ratio of the magnitude of the error spectrum to auditory masking (masking to error ratio), and quantifies how much the error spectrum is perceived by human hearing. The larger the masking-to-error ratio calculated here, the smaller the perceptual distortion perceived by humans, even though the error spectrum for auditory masking is smaller.
- Search unit 4063 obtains the highest masking-to-error ratio (that is, the perceived error spectrum is the smallest) among some or all residual spectrum candidates in residual spectrum codebook 41 2. The residual spectrum candidate is searched, and the encoding parameter indicating the searched residual candidate is output to the multiplexing unit 50.
- the configuration of the second layer code key unit 40 may be the same as that shown in FIG. 2 except for the scale factor code key unit 404 and the scale factor decoding key unit 407. .
- the first layer decoded spectrum is supplied to adder 413 without the amplitude value being corrected by the scale factor.
- the expanded residual spectrum is directly added to the first layer decoding spectrum.
- the force described for the configuration in which the residual spectrum is inversely transformed (expanded) by the inverse transform unit 411 may adopt the following configuration. That is, a target residual spectrum is generated by subtracting the first layer decoded spectrum after multiplication by the scale factor ratio from the original spectrum, and this target residual spectrum is forward-converted (compressed using a selected nonlinear transformation function). The residual spectrum closest to the target residual spectrum after nonlinear transformation may be searched and determined from the residual spectral codebook. In this configuration, instead of the inverse transform unit 411, a forward transform unit that forward transforms (compresses) the target residual spectrum using a nonlinear transform function is used.
- residual spectrum codebook 412 has residual spectrum codebooks # 1 to #N corresponding to the respective nonlinear transformation functions # 1 to #N, and The selection result information may be input to the residual spectrum codebook 412 as well.
- a spectral codebook is selected.
- the graph in FIG. 5 shows the relationship between the standard deviation ⁇ c of the first layer decoding spectrum and the standard deviation ⁇ e of the error spectrum generated by subtracting the first layer decoded spectrum from the original spectral power. This graph shows the results for an audio signal of about 30 seconds.
- the error spectrum here is equivalent to the spectrum that the second layer is the target of the code. Therefore, it is important that the error spectrum can be encoded with a small number of bits with high quality (so that auditory distortion is reduced).
- standard deviation ⁇ e of the error spectrum is estimated from standard deviation ⁇ c of the first layer decoded spectrum, and this estimated standard Select the optimal nonlinear transformation function for deviation ⁇ e from nonlinear transformation functions # 1 to # ⁇ .
- the horizontal axis represents the first layer decoding space.
- the standard deviation ⁇ c of the tuttle and the vertical axis represent the standard deviation ⁇ e of the error spectrum.
- the standard deviation ⁇ e degree of variation of the error spectrum
- ⁇ c the degree of variation of the first layer decoded spectrum
- FIG. 7 shows an example of the nonlinear conversion function.
- the non-linear transformation function selected by the selection unit 409 is selected according to the standard deviation estimated value (standard deviation ⁇ c of the first layer decoded spectrum in this embodiment) of the encoding target.
- standard deviation estimated value standard deviation ⁇ c of the first layer decoded spectrum in this embodiment
- a suitable nonlinear transformation function is selected.
- one of the deviations of the nonlinear conversion function is selected according to the magnitude of the standard deviation ⁇ e of the error spectrum.
- non-linear conversion function for example, a non-linear conversion function used in the rule PCM as expressed by Equation (1) is used.
- a and B are constants that define the characteristics of the nonlinear transformation function, and sgn () represents a function that returns a sign.
- sgn () represents a function that returns a sign.
- Small standard deviation Use a nonlinear transformation function with a small for the error spectrum and a nonlinear transformation function with a large for the error spectrum with a large standard deviation. Since the appropriate value depends on the nature of the first layer code, it must be determined using data for intensive learning.
- a function represented by Expression (2) may be used as the nonlinear conversion function.
- A is a constant that defines the characteristics of the nonlinear function.
- multiple nonlinear transformation functions with different bases a are prepared in advance, and which nonlinear transformation function is used when signing the error spectrum based on the standard deviation ⁇ c of the first layer decoded spectrum V, Select whether or not.
- a small standard deviation and error spectrum a small a is used, and a nonlinear transformation function is used.
- a magnitude is used and a nonlinear transformation function is used. Since the appropriate a depends on the nature of the first layer coding, it is decided to use data for training.
- nonlinear conversion functions are given as examples, and the present invention is not limited by what kind of nonlinear conversion function is used.
- the dynamic range of the amplitude value of the spectrum (ratio of maximum amplitude value to minimum amplitude value) is very large. Therefore, when encoding the amplitude spectrum, applying linear quantization with a uniform quantization step size requires a very large number of bits. If the number of encoded bits is limited, if the step size is set small, the amplitude value and the spectrum are clipped, resulting in a large quantization error in the clipping portion. On the other hand, when the step size is set large, the amplitude value is small and the quantization error of the spectrum is large.
- the present invention is not limited to this, the spectrum is divided into a plurality of subbands, the standard deviation power of the first layer decoded spectrum is estimated for each subband, and the standard deviation of the error spectrum is estimated, and the estimated standard deviation is calculated.
- a configuration may be used in which the spectrum of each subband is encoded using an optimal nonlinear transformation function.
- the degree of variation of the first layer decoded signal spectrum tends to be larger as the frequency is lower, and the degree of variation is smaller as the frequency is higher.
- a plurality of nonlinear transformation functions designed and prepared for each of a plurality of subbands may be used.
- a configuration is adopted in which a plurality of nonlinear conversion function units 410 are provided for each subband. That is, the nonlinear transformation function part corresponding to each subband has a set of nonlinear transformation functions # 1 to #N.
- the selection unit 409 selects, for each of the plurality of subbands, one of the plurality of nonlinear conversion functions # 1 to #N prepared for each of the plurality of subbands. select.
- the separation unit 60 separates the input bit stream into code key parameters (for the first layer) and code key parameters (for the second layer), respectively,
- the data is output to the layer decoding key unit 70 and the second layer decoding key unit 80.
- the code parameter (for the first layer) is the encoding parameter obtained by the first layer encoding unit 10, and for example, the first layer encoding unit 10 uses CELP (Code Excited Linear Prediction). In this case, this encoding parameter is composed of LPC coefficient, lag, drive signal, gain information, etc.
- CELP Code Excited Linear Prediction
- this encoding parameter is composed of LPC coefficient, lag, drive signal, gain information, etc.
- the sign parameter (for the second layer) is the sign factor parameter for the scale factor ratio and the coding parameter for the residual spectrum.
- the first layer decoding key unit 70 also determines the first layer code key parameter power from the first layer decoded signal. It is generated and output to the second layer decoding unit 80 and, if necessary, is output as a low-quality decoded signal.
- Second layer decoding section 80 uses the first layer decoded signal, the sign factor parameter of the scale factor ratio, and the sign key parameter of the residual spectrum, That is, a high-quality decoded signal is generated, and this decoded signal is output as necessary.
- the minimum quality of reproduced speech is ensured by the first layer decoded signal, and the quality of reproduced speech can be enhanced by the second layer decoded signal. Also, whether the deviation of the first layer decoded signal or the second layer decoded signal is output depends on whether the second layer encoding parameter can be obtained depending on the network environment (occurrence of packet loss, etc.) Depends on the setting etc.
- second layer decoding section 80 will be described in more detail.
- the configuration of second layer decoding section 80 is shown in FIG. Note that the scale factor decoding unit 801, MDCT analysis unit 802, multiplier 803, standard deviation calculation unit 804, selection unit 805, nonlinear transformation function unit 806, inverse transformation unit 807, residual spectrum codebook 808 shown in FIG. , And adder 809 are scale factor decoding unit 407, M DCT analysis unit 401, multiplier 405, standard deviation calculation unit provided in second layer code unit 40 (FIG. 2) of the speech code unit. 408, selection unit 409, nonlinear transformation function unit 410, inverse transformation unit 411, residual spectrum codebook 412 and adder 413 correspond to each other, and the corresponding components have the same functions.
- scale factor decoding section 801 decodes the scale factor ratio based on the scale factor ratio encoding parameter, and outputs the decoded ratio (decoded scale factor ratio) to multiplier 803. To do.
- MDCT analysis section 802 performs frequency analysis on the first layer decoded signal by MDCT conversion to calculate an M DCT coefficient (first layer decoded spectrum), and outputs the first layer decoded spectrum to multiplier 8003.
- Multiplier 803 multiplies the first layer decoded spectrum output from MDCT analysis unit 802 by the decoding scale factor ratio output from scale factor decoding unit 801 for each corresponding subband, and standardizes the multiplication result.
- the scale factor of the first layer decoded spectrum is the scale factor of the original spectrum. Get closer to.
- the standard deviation calculation unit 804 calculates the standard deviation er e of the first layer decoding spectrum after the decoding scale factor ratio multiplication and outputs the standard deviation er e to the selection unit 805. By calculating the standard deviation, the degree of variation of the first layer decoded spectrum is quantified.
- the selection unit 805 selects a force that uses a nonlinear transformation function as a function for nonlinearly inverse transforming the residual spectrum in the inverse transformation unit 807, Information indicating the selection result is output to the nonlinear transformation function unit 806.
- a plurality of nonlinear transformation function units 806 are prepared based on the selection result of the selection unit 805, and one of the nonlinear transformation functions # 1 to #N is converted into an inverse transformation unit 807. Output to
- the residual spectrum codebook 808 stores a plurality of residual spectrum candidates obtained by compressing the residual spectrum by nonlinear transformation.
- the residual spectrum candidates stored in the residual spectrum codebook 808 may be scalars or vectors.
- the residual spectrum code book 808 is designed using data for intensive learning.
- Inverse transform section 807 performs inverse transform on any one of residual spectrum candidates stored in residual spectrum codebook 808 using the nonlinear transform function output from nonlinear transform function section 806. (Expansion processing) is performed and output to the adder 809. Of the residual spectrum candidates, the residual spectrum to be subjected to inverse transformation is selected according to the encoding parameter of the residual spectrum input from the separation unit 60.
- Adder 809 adds the residual spline candidate after inverse transformation (after decompression) to the first layer decoded spectrum after decoding scale factor ratio multiplication, and outputs the result to time domain conversion section 810 .
- the spectrum obtained as a result of this addition corresponds to the second layer decoded spectrum in the frequency domain.
- time domain conversion section 810 After converting the second layer decoded spectrum into a time domain signal, time domain conversion section 810 performs processing such as appropriate windowing and superposition addition as necessary to eliminate discontinuities generated between frames. To avoid and output the final high quality decoded signal.
- the degree of variation of the first layer decoded spectrum is estimated, and the degree of variation of the error spectrum is estimated in the second layer.
- Select a conversion function At this time, the non-linear transformation function can be selected in the speech decoding apparatus in the same manner as the speech encoding apparatus without transmitting the selection information of the non-linear transformation function from the speech encoding apparatus to the speech decoding apparatus. For this reason, in this embodiment, there is no need to transmit the selection information of the nonlinear transformation function from the speech coding apparatus to the speech decoding apparatus! Therefore, the quantization performance can be improved without increasing the bit rate.
- FIG. 10 shows the configuration of error comparison section 406 according to Embodiment 2 of the present invention.
- error comparison section 406 according to the present embodiment includes weighted error calculation section 4064 instead of masking-to-error ratio calculation section 4062 in the configuration of Embodiment 1 (FIG. 3). .
- FIG. 10 the same components as those in FIG.
- the weighted error calculation unit 4064 multiplies the error spectrum output from the subtractor 4061 by a weight function determined by auditory masking, and calculates its energy (weighted error energy).
- the weighting function is determined by the size of auditory masking, and for frequencies with large auditory masking, distortion at that frequency is difficult to hear, so the weight is set small. Conversely, for frequencies with low auditory masking, the distortion at that frequency is easy to hear, so set a large weight. In this way, the weighted error calculation unit 4064 assigns weights such that the auditory masking is large and the influence of the error spectrum at the frequency is reduced, and the auditory masking is small and the influence of the error spectrum at the frequency is increased. Calculate energy with. Then, the calculated energy value is output to search section 4063.
- Search section 4063 searches for a residual spectrum candidate when the weighted error energy is minimized among some or all residual spectrum candidates in residual spectrum codebook 412 and searches for them.
- the sign key parameter representing the residual spectrum candidate is output to the multiplexing unit 50.
- FIG. 11 shows the configuration of second layer code key unit 40 according to Embodiment 3 of the present invention.
- the second layer code key unit 40 according to the present embodiment is the same as the configuration of the first embodiment ( Instead of the selection unit 409 in FIG. In FIG. 11, the same components as those in FIG.
- Signed selection section 414 receives the first layer decoding spectrum after decoding scale factor ratio multiplication from multiplier 405, and the standard deviation ⁇ c of the first layer decoded spectrum is the standard deviation. Input from the calculation unit 408. In addition, the original spectrum is input to the signed selection unit 414 from the MDCT analysis unit 402.
- the signed selection unit 414 first limits the possible values of the estimated standard deviation of the error spectrum based on the standard deviation ⁇ c. Next, signed selection section 414 obtains the first layer decoded spectrum power error spectrum after multiplication of the original spectrum and the decoding scale factor ratio, calculates the standard deviation of this error spectrum, and calculates the estimated standard deviation closest to this standard deviation. Select from the estimated standard deviations limited as described above. Then, the signed selection unit 414 selects a nonlinear transformation function in the same manner as in the first embodiment according to the selected estimated standard deviation (degree of variation of the error spectrum), and selects the selected estimated standard deviation. The encoding parameter obtained by encoding the information is output to the multiplexing unit 50.
- the multiplexing unit 50 outputs the code parameter output from the first layer encoding unit 10, the encoding parameter output from the second layer encoding unit 40, and the signed selection unit 414.
- the encoded parameters are multiplexed and output as a bit stream.
- the horizontal axis represents the standard deviation ⁇ c of the first layer decoded spectrum
- the vertical axis represents the standard deviation ⁇ e of the error spectrum.
- the estimated values that can be taken by the estimated standard deviation of the error spectrum are limited to a plurality based on the standard deviation of the first layer decoded spectrum, and the original spectrum is selected from the limited estimated positions.
- the first layer decoded spectrum after decoding scale factor ratio multiplication In order to select the estimated value closest to the standard deviation of the difference spectrum, a more accurate standard deviation can be obtained by signing the estimated value variation due to the standard deviation of the first layer decoding spectrum.
- the speech quality can be improved by further improving the quantization performance.
- second layer decoding section 80 according to Embodiment 3 of the present invention includes signed selection section 811 instead of selection section 805 in the configuration of Embodiment 1 (FIG. 9).
- FIG. 13 the same components as those in FIG.
- the signed selection unit 811 To the signed selection unit 811, the encoding parameter of the selection information separated by the separation unit 60 is input.
- the signed selection unit 811 selects a force that uses a nonlinear transformation function as a function for nonlinear transformation of the residual spectrum based on the estimated standard deviation indicated by the selection information, and information indicating the selection result is a nonlinear transformation function unit. Output to 806.
- the standard deviation of the error spectrum may be directly signed without using the standard deviation of the first layer decoded spectrum.
- the frame for which the correlation between the standard deviation of the first layer decoded spectrum and the standard deviation of the error spectrum is small is also quantized. Performance can be improved.
- the standard deviation is used as an index representing the degree of variation in the spectrum.
- dispersion the difference or ratio between the maximum amplitude spectrum and the minimum amplitude spectrum, or the like may be used.
- the force described in the case of using MDCT as a conversion method is not limited to this, and also when using other conversion methods, such as DFT, cosine conversion, Wavalet conversion, etc.
- the present invention can be similarly applied.
- the hierarchical structure of scalable coding has been described as two layers of the first layer (lower layer) and the second layer (upper layer).
- the present invention is not limited to this.
- the present invention can be similarly applied to scalable codes having upper layers.
- any one of the plurality of layers is regarded as the first layer in each of the above embodiments, and a layer higher than that layer is regarded as the second layer in each of the above embodiments, and the present invention is similarly applied. Can be applied to.
- the present invention is also applicable when the sampling rates of signals handled by each layer are different.
- the sampling rate of the signal handled by the nth layer is expressed as Fs (n)
- the relationship of Fs (n) ⁇ F s (n + l) holds.
- the speech encoding apparatus and speech decoding apparatus are mounted on a wireless communication apparatus such as a wireless communication mobile station apparatus or a wireless communication base station apparatus used in a mobile communication system. It is also possible.
- each functional block used in the description of the above embodiment is typically realized as an LSI which is an integrated circuit. These may be individually made into one chip, or may be made into one chip to include some or all of them.
- IC integrated circuit
- system LSI system LSI
- super LSI super LSI
- non-linear LSI depending on the difference in power integration as LSI.
- the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible. You may use an FPGA (Field Programmable Gate Array) that can be programmed after manufacturing the LSI, or a reconfigurable processor that can reconfigure the connection and settings of the circuit cells inside the LSI. [0093] Furthermore, if integrated circuit technology that replaces LSI emerges as a result of progress in semiconductor technology or other derived technology, it is naturally also possible to perform functional block integration using that technology. Biotechnology can be applied.
- FPGA Field Programmable Gate Array
- the present invention can be applied to the use of a communication device in a mobile communication system or a packet communication system using the Internet protocol.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Quality & Reliability (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
BRPI0518193-3A BRPI0518193A (pt) | 2004-10-27 | 2005-10-25 | aparelho e método de codificação vocal, aparelhos de estação móvel e de base de comunicação de rádio |
EP05799366A EP1806737A4 (en) | 2004-10-27 | 2005-10-25 | TONE CODIER AND TONE CODING METHOD |
JP2006543163A JP4859670B2 (ja) | 2004-10-27 | 2005-10-25 | 音声符号化装置および音声符号化方法 |
US11/577,424 US8099275B2 (en) | 2004-10-27 | 2005-10-25 | Sound encoder and sound encoding method for generating a second layer decoded signal based on a degree of variation in a first layer decoded signal |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2004-312262 | 2004-10-27 | ||
JP2004312262 | 2004-10-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2006046547A1 true WO2006046547A1 (ja) | 2006-05-04 |
Family
ID=36227787
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2005/019579 WO2006046547A1 (ja) | 2004-10-27 | 2005-10-25 | 音声符号化装置および音声符号化方法 |
Country Status (8)
Country | Link |
---|---|
US (1) | US8099275B2 (ja) |
EP (1) | EP1806737A4 (ja) |
JP (1) | JP4859670B2 (ja) |
KR (1) | KR20070070189A (ja) |
CN (1) | CN101044552A (ja) |
BR (1) | BRPI0518193A (ja) |
RU (1) | RU2007115914A (ja) |
WO (1) | WO2006046547A1 (ja) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2009501944A (ja) * | 2005-07-15 | 2009-01-22 | マイクロソフト コーポレーション | ディジタル・メディア・スペクトル・データの効率的コーディングに使用される辞書内のコードワードの変更 |
US20090109964A1 (en) * | 2007-10-23 | 2009-04-30 | Samsung Electronics Co., Ltd. | APPARATUS AND METHOD FOR PLAYOUT SCHEDULING IN VOICE OVER INTERNET PROTOCOL (VoIP) SYSTEM |
WO2010103854A3 (ja) * | 2009-03-13 | 2011-03-03 | パナソニック株式会社 | 音声符号化装置、音声復号装置、音声符号化方法及び音声復号方法 |
JP2011518345A (ja) * | 2008-03-14 | 2011-06-23 | ドルビー・ラボラトリーズ・ライセンシング・コーポレーション | スピーチライク信号及びノンスピーチライク信号のマルチモードコーディング |
CN101582259B (zh) * | 2008-05-13 | 2012-05-09 | 华为技术有限公司 | 立体声信号编解码方法、装置及编解码系统 |
US9349376B2 (en) | 2007-06-29 | 2016-05-24 | Microsoft Technology Licensing, Llc | Bitstream syntax for multi-process audio decoding |
US9443525B2 (en) | 2001-12-14 | 2016-09-13 | Microsoft Technology Licensing, Llc | Quality improvement techniques in an audio encoder |
WO2020179472A1 (ja) * | 2019-03-05 | 2020-09-10 | ソニー株式会社 | 信号処理装置および方法、並びにプログラム |
Families Citing this family (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4771674B2 (ja) * | 2004-09-02 | 2011-09-14 | パナソニック株式会社 | 音声符号化装置、音声復号化装置及びこれらの方法 |
KR20080049085A (ko) | 2005-09-30 | 2008-06-03 | 마츠시타 덴끼 산교 가부시키가이샤 | 음성 부호화 장치 및 음성 부호화 방법 |
US7991611B2 (en) * | 2005-10-14 | 2011-08-02 | Panasonic Corporation | Speech encoding apparatus and speech encoding method that encode speech signals in a scalable manner, and speech decoding apparatus and speech decoding method that decode scalable encoded signals |
BRPI0619258A2 (pt) * | 2005-11-30 | 2011-09-27 | Matsushita Electric Ind Co Ltd | aparelho de codificação de sub-banda e método de codificação de sub-banda |
ATE501505T1 (de) * | 2006-04-27 | 2011-03-15 | Panasonic Corp | Audiocodierungseinrichtung, audiodecodierungseinrichtung und verfahren dafür |
US8560328B2 (en) * | 2006-12-15 | 2013-10-15 | Panasonic Corporation | Encoding device, decoding device, and method thereof |
US20090006081A1 (en) * | 2007-06-27 | 2009-01-01 | Samsung Electronics Co., Ltd. | Method, medium and apparatus for encoding and/or decoding signal |
CN101527138B (zh) * | 2008-03-05 | 2011-12-28 | 华为技术有限公司 | 超宽带扩展编码、解码方法、编解码器及超宽带扩展系统 |
CN102081927B (zh) * | 2009-11-27 | 2012-07-18 | 中兴通讯股份有限公司 | 一种可分层音频编码、解码方法及系统 |
WO2012052802A1 (en) * | 2010-10-18 | 2012-04-26 | Nokia Corporation | An audio encoder/decoder apparatus |
WO2016162283A1 (en) * | 2015-04-07 | 2016-10-13 | Dolby International Ab | Audio coding with range extension |
Family Cites Families (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2956548B2 (ja) * | 1995-10-05 | 1999-10-04 | 松下電器産業株式会社 | 音声帯域拡大装置 |
JPH08278800A (ja) * | 1995-04-05 | 1996-10-22 | Fujitsu Ltd | 音声通信システム |
JP3299073B2 (ja) * | 1995-04-11 | 2002-07-08 | パイオニア株式会社 | 量子化装置及び量子化方法 |
US5884269A (en) * | 1995-04-17 | 1999-03-16 | Merging Technologies | Lossless compression/decompression of digital audio data |
KR100261254B1 (ko) * | 1997-04-02 | 2000-07-01 | 윤종용 | 비트율 조절이 가능한 오디오 데이터 부호화/복호화방법 및 장치 |
JPH10288852A (ja) | 1997-04-14 | 1998-10-27 | Canon Inc | 電子写真感光体 |
US6615169B1 (en) * | 2000-10-18 | 2003-09-02 | Nokia Corporation | High frequency enhancement layer coding in wideband speech codec |
US6614370B2 (en) * | 2001-01-26 | 2003-09-02 | Oded Gottesman | Redundant compression techniques for transmitting data over degraded communication links and/or storing data on media subject to degradation |
US20020133246A1 (en) * | 2001-03-02 | 2002-09-19 | Hong-Kee Kim | Method of editing audio data and recording medium thereof and digital audio player |
US6947886B2 (en) * | 2002-02-21 | 2005-09-20 | The Regents Of The University Of California | Scalable compression of audio and other signals |
DE60214599T2 (de) * | 2002-03-12 | 2007-09-13 | Nokia Corp. | Skalierbare audiokodierung |
US7275036B2 (en) * | 2002-04-18 | 2007-09-25 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for coding a time-discrete audio signal to obtain coded audio data and for decoding coded audio data |
WO2003091989A1 (en) * | 2002-04-26 | 2003-11-06 | Matsushita Electric Industrial Co., Ltd. | Coding device, decoding device, coding method, and decoding method |
JP3881946B2 (ja) * | 2002-09-12 | 2007-02-14 | 松下電器産業株式会社 | 音響符号化装置及び音響符号化方法 |
FR2849727B1 (fr) * | 2003-01-08 | 2005-03-18 | France Telecom | Procede de codage et de decodage audio a debit variable |
US7787632B2 (en) * | 2003-03-04 | 2010-08-31 | Nokia Corporation | Support of a multichannel audio extension |
DE602004004950T2 (de) * | 2003-07-09 | 2007-10-31 | Samsung Electronics Co., Ltd., Suwon | Vorrichtung und Verfahren zum bitraten-skalierbaren Sprachkodieren und -dekodieren |
-
2005
- 2005-10-25 EP EP05799366A patent/EP1806737A4/en not_active Withdrawn
- 2005-10-25 KR KR1020077009516A patent/KR20070070189A/ko not_active Application Discontinuation
- 2005-10-25 CN CNA2005800360114A patent/CN101044552A/zh active Pending
- 2005-10-25 RU RU2007115914/09A patent/RU2007115914A/ru not_active Application Discontinuation
- 2005-10-25 BR BRPI0518193-3A patent/BRPI0518193A/pt not_active Application Discontinuation
- 2005-10-25 WO PCT/JP2005/019579 patent/WO2006046547A1/ja active Application Filing
- 2005-10-25 JP JP2006543163A patent/JP4859670B2/ja not_active Expired - Fee Related
- 2005-10-25 US US11/577,424 patent/US8099275B2/en active Active
Non-Patent Citations (1)
Title |
---|
OSHIKIRI MASAHIRO ET AL: "Jikan-Shuhasu Ryoki no Keisu no teio Sentaku Vector Ryoshika o Mochiita 10kHz Taiiki Scalable Fugoka Hoshiki. (A 10 KHZ bandwith scalable codec using adaptive selection VQ of time-frequency coefficients)", FIT2003 KOEN RONBUNSHU., 25 August 2003 (2003-08-25), pages 239 - 240, (F-017), XP002986229 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9443525B2 (en) | 2001-12-14 | 2016-09-13 | Microsoft Technology Licensing, Llc | Quality improvement techniques in an audio encoder |
JP2009501944A (ja) * | 2005-07-15 | 2009-01-22 | マイクロソフト コーポレーション | ディジタル・メディア・スペクトル・データの効率的コーディングに使用される辞書内のコードワードの変更 |
US9349376B2 (en) | 2007-06-29 | 2016-05-24 | Microsoft Technology Licensing, Llc | Bitstream syntax for multi-process audio decoding |
US9741354B2 (en) | 2007-06-29 | 2017-08-22 | Microsoft Technology Licensing, Llc | Bitstream syntax for multi-process audio decoding |
US20090109964A1 (en) * | 2007-10-23 | 2009-04-30 | Samsung Electronics Co., Ltd. | APPARATUS AND METHOD FOR PLAYOUT SCHEDULING IN VOICE OVER INTERNET PROTOCOL (VoIP) SYSTEM |
US8615045B2 (en) * | 2007-10-23 | 2013-12-24 | Samsung Electronics Co., Ltd | Apparatus and method for playout scheduling in voice over internet protocol (VoIP) system |
JP2011518345A (ja) * | 2008-03-14 | 2011-06-23 | ドルビー・ラボラトリーズ・ライセンシング・コーポレーション | スピーチライク信号及びノンスピーチライク信号のマルチモードコーディング |
CN101582259B (zh) * | 2008-05-13 | 2012-05-09 | 华为技术有限公司 | 立体声信号编解码方法、装置及编解码系统 |
WO2010103854A3 (ja) * | 2009-03-13 | 2011-03-03 | パナソニック株式会社 | 音声符号化装置、音声復号装置、音声符号化方法及び音声復号方法 |
WO2020179472A1 (ja) * | 2019-03-05 | 2020-09-10 | ソニー株式会社 | 信号処理装置および方法、並びにプログラム |
JP7533440B2 (ja) | 2019-03-05 | 2024-08-14 | ソニーグループ株式会社 | 信号処理装置および方法、並びにプログラム |
US12170092B2 (en) | 2019-03-05 | 2024-12-17 | Sony Group Corporation | Signal processing device, method, and program |
Also Published As
Publication number | Publication date |
---|---|
RU2007115914A (ru) | 2008-11-10 |
CN101044552A (zh) | 2007-09-26 |
KR20070070189A (ko) | 2007-07-03 |
EP1806737A4 (en) | 2010-08-04 |
JPWO2006046547A1 (ja) | 2008-05-22 |
US20080091440A1 (en) | 2008-04-17 |
JP4859670B2 (ja) | 2012-01-25 |
EP1806737A1 (en) | 2007-07-11 |
US8099275B2 (en) | 2012-01-17 |
BRPI0518193A (pt) | 2008-11-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101220621B1 (ko) | 부호화 장치 및 부호화 방법 | |
US8457319B2 (en) | Stereo encoding device, stereo decoding device, and stereo encoding method | |
JP5383676B2 (ja) | 符号化装置、復号装置およびこれらの方法 | |
US7983904B2 (en) | Scalable decoding apparatus and scalable encoding apparatus | |
JP4859670B2 (ja) | 音声符号化装置および音声符号化方法 | |
JP2010538316A (ja) | 改良された音声及びオーディオ信号の変換符号化 | |
JP5036317B2 (ja) | スケーラブル符号化装置、スケーラブル復号化装置、およびこれらの方法 | |
US8010349B2 (en) | Scalable encoder, scalable decoder, and scalable encoding method | |
CN112352277B (zh) | 编码装置及编码方法 | |
CN102436822A (zh) | 信号控制装置及其方法 | |
Kandadai et al. | Optimal Bit Layering for Scalable Audio Compression Using Objective Audio Quality Metrics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BW BY BZ CA CH CN CO CR CU CZ DK DM DZ EC EE EG ES FI GB GD GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV LY MD MG MK MN MW MX MZ NA NG NO NZ OM PG PH PL PT RO RU SC SD SG SK SL SM SY TJ TM TN TR TT TZ UG US UZ VC VN YU ZA ZM |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GM KE LS MW MZ NA SD SZ TZ UG ZM ZW AM AZ BY KG MD RU TJ TM AT BE BG CH CY DE DK EE ES FI FR GB GR HU IE IS IT LU LV MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2006543163 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2005799366 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 11577424 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 200580036011.4 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2007115914 Country of ref document: RU Ref document number: 1020077009516 Country of ref document: KR |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWP | Wipo information: published in national office |
Ref document number: 2005799366 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 11577424 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: PI0518193 Country of ref document: BR |