EP2907324B1 - System and method for reducing latency in transposer-based virtual bass systems - Google Patents
System and method for reducing latency in transposer-based virtual bass systems Download PDFInfo
- Publication number
- EP2907324B1 EP2907324B1 EP13771123.0A EP13771123A EP2907324B1 EP 2907324 B1 EP2907324 B1 EP 2907324B1 EP 13771123 A EP13771123 A EP 13771123A EP 2907324 B1 EP2907324 B1 EP 2907324B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- frequency
- signal
- cqmf
- virtual bass
- audio signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims description 56
- 230000017105 transposition Effects 0.000 claims description 79
- 238000004458 analytical method Methods 0.000 claims description 77
- 230000005236 sound signal Effects 0.000 claims description 57
- 230000004044 response Effects 0.000 claims description 33
- 230000003111 delayed effect Effects 0.000 claims description 17
- 238000007781 pre-processing Methods 0.000 claims description 10
- 238000004590 computer program Methods 0.000 claims description 2
- 238000011914 asymmetric synthesis Methods 0.000 claims 2
- 238000012545 processing Methods 0.000 description 50
- 230000015572 biosynthetic process Effects 0.000 description 47
- 238000003786 synthesis reaction Methods 0.000 description 47
- 230000009467 reduction Effects 0.000 description 28
- 230000008569 process Effects 0.000 description 16
- 230000006870 function Effects 0.000 description 15
- 238000010586 diagram Methods 0.000 description 13
- 238000005070 sampling Methods 0.000 description 11
- 230000002829 reductive effect Effects 0.000 description 8
- 238000001228 spectrum Methods 0.000 description 7
- 102000003712 Complement factor B Human genes 0.000 description 6
- 108090000056 Complement factor B Proteins 0.000 description 6
- 230000003247 decreasing effect Effects 0.000 description 6
- 230000007812 deficiency Effects 0.000 description 6
- 230000014509 gene expression Effects 0.000 description 6
- 238000011946 reduction process Methods 0.000 description 6
- 230000008859 change Effects 0.000 description 4
- 239000000872 buffer Substances 0.000 description 3
- 230000002238 attenuated effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- RVRCFVVLDHTFFA-UHFFFAOYSA-N heptasodium;tungsten;nonatriacontahydrate Chemical compound O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.O.[Na+].[Na+].[Na+].[Na+].[Na+].[Na+].[Na+].[W].[W].[W].[W].[W].[W].[W].[W].[W].[W].[W] RVRCFVVLDHTFFA-UHFFFAOYSA-N 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 239000011295 pitch Substances 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 238000000638 solvent extraction Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000002592 echocardiography Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 230000000135 prohibitive effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 239000013598 vector Substances 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/04—Circuits for transducers, loudspeakers or microphones for correcting frequency response
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/038—Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R2430/00—Signal processing covered by H04R, not provided for in its groups
- H04R2430/03—Synergistic effects of band splitting and sub-band processing
Definitions
- One or more embodiments relate generally to transform-based audio signal processing, and more specifically to reducing latency in transposer-based virtual bass synthesis systems.
- Bass synthesis refers to methods of adding components to the low frequency range of a signal in order to enhance the perceived bass.
- a sub-bass synthesis technique creates low frequency components below the existing partials of a signal in order to extend and improve the lowest frequency range present in the subject audio content.
- Another method uses virtual pitch algorithms that generate audible harmonics from an inaudible bass range (e.g., low pitched bass played through small loudspeakers), hence making the harmonics, and ultimately also the pitch, audible in order to improve the bass response.
- Virtual bass synthesis is a virtual pitch method that increases the perceived level of bass content in audio when played on small loudspeakers that cannot physically reproduce the low-end bass frequencies.
- the method is based on the 'missing fundamental' psycho-acoustic observation that low pitches can be inferred by the human auditory system from upper harmonics even when the fundamental and the first harmonics themselves are missing.
- the basic method of functionality is to analyze the bass frequencies present in the audio and generate audible upper harmonics that aid the perception of the missing lower frequencies.
- a main feature of virtual bass is that it enhances the perceived bass response on devices with small speakers by synthesizing upper harmonics for frequencies below the low-frequency roll-off of the device (e.g., below 150 Hz).
- FIG. 1A shows the frequency-amplitude spectrum of an audio signal having an inaudible range 10 of frequency components, and an audible range of frequency components above the inaudible range.
- Harmonic transposition of frequency components in the inaudible range 10 can generate transposed frequency components in portion 11 of the audible range, which can enhance the perceived level of bass content of the audio signal during playback.
- Such harmonic transposition may include application of multiple transposition factors to each relevant frequency component of the input audio signal to generate multiple harmonics of the component.
- the delay or latency associated with the frequency transposition function can be excessive for certain applications.
- a digital audio processing system that has a latency of 1025 samples may use a legacy virtual bass system that adds an additional 3200 samples of delay. This can cause a total delay to exceed 88 milliseconds, given a sampling frequency ( f s ) of 48kHz. This amount of latency is generally problematic and even prohibitive for gaming and telecommunications applications, where a latency of about 100 milliseconds starts to become noticeable in terms of audible signal delay.
- FIG. 1B illustrates the delay associated with symmetric windows used in legacy virtual bass systems, as known in the prior art.
- FIG. 1B graphically illustrates the delay imposed by a second-order transposer, i.e., a transposer that generates 2 nd order harmonics.
- the center of one of the stylistic symmetric analysis window is chosen as the time zero reference, and new input samples 104 can be added from time t 0 in the analysis phase 102, assuming a time stride S A of the analysis windows.
- Time plot 110 shows the time stretch duality of the transposer, where to is stretched to 2 ⁇ t 0 in the synthesis phase 112.
- the input signal to the CQMF (Complex Quadrature Mirror Filter) analysis stage and the output signal from the CQMF synthesis stage generally both have the same sampling frequency f s , where f s is usually set to 44.1 or 48 kHz.
- the input signal sampling rate to the virtual bass process may be f s /64 since the system is usually processing the first CQMF signal only from a 64-channel CQMF bank. It should be noted that CQMF sizes other than 64 channels could also be used.
- the transposed output from the legacy virtual bass processing system has a sampling frequency of 2 ⁇ f s /64 because of the combined transposition function using a factor two base transposition factor, resulting in a factor two bandwidth expansion.
- the base transposition factor is the factor where the source transform bins (or frequency bands) are mapped in a one-to-one relationship to the target transform bins (or frequency bands), i.e., there is no interpolation or decimation involved in the source to target bin mapping.
- the base transposition factor also governs the relation between the time strides of the analysis and synthesis windows. More specifically, the synthesis time stride equals the analysis time stride multiplied by the base transposition factor.
- Embodiments include a latency reduction system in a virtual bass processing system that performs harmonic transposition on low frequency components of an audio signal to generate transposed data indicative of harmonics.
- the harmonic transposition process uses a base transposition factor greater than two, and generates the harmonics in response to frequency-domain values determined by transform and inverse transform stages that use asymmetric analysis and synthesis windows.
- An enhanced audio signal is generated by combining a virtual bass signal with the delayed audio signal through the use of Nyquist analysis filter banks that comprise truncated prototype filters.
- the virtual bass signal may be allowed to lag the delayed audio signal by a defined time period when combining with the audio signal to further reduce the latency caused by the harmonic transposition process.
- Embodiments include a method of reducing latency in a virtual bass generation system by performing harmonic transposition on low frequency components of an input audio signal to generate transposed data indicative of harmonics, wherein the harmonic transposition uses a base transposition factor of an integer value greater than two. It generates the harmonics in response to frequency-domain values determined by a time-to-frequency domain transform stage and a subsequent inverse frequency-to-time domain transform stage through the use of asymmetric analysis and synthesis windows for the time-to-frequency domain transform and inverse frequency-to-time domain transforms.
- the input audio signal is a sub-banded CQMF (complex-valued quadrature mirror filter) signal and samples of the input audio signal may be pre-processed to generate critically sampled audio indicative of the low frequency components.
- CQMF complex-valued quadrature mirror filter
- the method processes the input audio signal through an analysis filter bank or transform to provide a set of analysis sub-band signals or frequency bins from the low frequency components, computes a set of synthesis sub-band signals or frequency bins using the base transposition factor B and transposition factor T, and processes the analysis sub-band signals or frequency bins through a synthesis filter bank or transform to generate a high frequency component from the set of synthesis sub-band signals.
- the method may further include generating a virtual bass signal in response to the transposed data, and generating an enhanced audio signal by combining the virtual bass signal with the input audio signal by applying one or two analysis filter banks to the virtual bass audio output signal, wherein the analysis filter banks comprise truncated prototype filters that have a defined number of filter coefficients removed.
- the method may yet further include a lag of the virtual bass signal by a pre-defined time period relative to the input audio signal, by combining the virtual bass signal with the input audio signal delayed a pre-defined time period shorter than the processing delay of the virtual bass system would imply, to generate an enhanced audio signal comprising time lagged virtual bass processed sub-band samples combined with delayed input sub-band samples.
- the base transposition factor under some embodiments extends the input audio signal in the frequency domain to a degree proportionate to the value of the base transposition factor to produce a transposed audio signal, and this base transposition factor may be an even integer value between 4 and 16.
- the analysis filter banks operating on the transposer CQMF output sub bands comprise an eight-channel Nyquist filter bank and a four-channel Nyquist filter bank, and the defined number of removed prototype filter coefficients comprises six coefficients.
- the input CQMF signal is routed directly from a preceding CQMF analysis bank channel 0 output, hence bypassing a subsequent Nyquist filter bank stage and so avoiding the related delay.
- Embodiments of the method may further include generating the low frequency components by performing a frequency domain oversampled transform on the input audio signal by generating windowed and zero-padded samples at a defined sample frequency (using the analysis time stride).
- the pre-defined time period when combining the virtual bass signal with the delayed input audio signal may be a value selected from the range of 0 samples to 1000 samples, since the virtual bass signal may be allowed to lag the wide band input audio signal up to 20 ms without noticeable degradation of the enhanced audio signal.
- the asymmetric analysis and synthesis windows are configured such that a longer portion of the analysis windows are stretched toward past input samples, and that a longer portion of the synthesis windows are stretched toward future output samples.
- Embodiments are also directed to systems or apparatus elements configured to implement at least some of the methods described above.
- Embodiments of systems and methods are described for reducing latency and algorithmic delays in transposer-based virtual bass systems.
- Such systems and methods utilize higher-order base transposition factors, low latency asymmetric transform windows, truncated Nyquist prototype filters, a time lagged virtual bass signal in respect to the original audio signal, and a bypassed Nyquist analysis filter bank in a preceding Hybrid filter bank stage.
- the expression performing an operation "on" a signal or data is used in a broad sense to denote performing the operation directly on the signal or data, or on a processed version of the signal or data (e.g., on a version of the signal that has undergone preliminary filtering or pre-processing prior to performance of the operation thereon).
- the expression "transposer” is used in a broad sense to denote an algorithmic unit or device that performs pitch-shifting or time-stretching of a real or complex-valued input signal, for parts of, or the entire available input signal spectrum.
- transposer Harmonic transposer
- phase vocoder phase vocoder
- high frequency generator high frequency generator
- harmonic generator may be used interchangeably.
- system is used in a broad sense to denote a device, system, or subsystem.
- a subsystem that implements a decoder may be referred to as a decoder system
- a system including such a subsystem e.g., a system that generates X output signals in response to multiple inputs, in which the subsystem generates M of the inputs and the other X - M inputs are received from an external source
- a decoder system e.g., a system that generates X output signals in response to multiple inputs, in which the subsystem generates M of the inputs and the other X - M inputs are received from an external source
- processor is used in a broad sense to denote a system or device programmable or otherwise configurable (e.g., with software or firmware) to perform operations on data (e.g., audio, or video or other image data).
- processors include a field-programmable gate array (or other configurable integrated circuit or chip set), a digital signal processor programmed and/or otherwise configured to perform pipelined processing on audio or other sound data, a programmable general purpose processor or computer, and a programmable microprocessor chip or chip set.
- audio processor and “audio processing unit” are used interchangeably, and in a broad sense, to denote a system configured to process audio data.
- audio processing units include, but are not limited to encoders (e.g., transcoders), decoders, vocoders, codecs, pre-processing systems, post-processing systems, and bitstream processing systems (sometimes referred to as bitstream processing tools).
- encoders e.g., transcoders
- decoders e.g., vocoders
- codecs e.g., pre-processing systems
- post-processing systems e.g., post-processing systems
- bitstream processing tools sometimes referred to as bitstream processing tools
- Embodiments are directed to systems and methods of decreasing virtual bass delay without requiring substantial changes to existing virtual bass processing components, such as the harmonic transposer used in a virtual bass processing system.
- Aspects of the virtual bass latency reduction system and method may be used in conjunction with a harmonic generator (transposer) in audio codecs (e.g., in a decoder).
- Aspects of the virtual bass latency reduction system and method may also be used in conjunction with other transposer or phase vocoder systems, e.g., traditional phase vocoders used for general time-stretching or pitch-shifting of audio signals.
- virtual bass generation methods using harmonic transposition involve the transposition of frequency components from an inaudible frequency range to an audible frequency range in order to improve playback of bass content in limited playback equipment, such as through small speakers that cannot physically reproduce the missing lower frequencies.
- Embodiments of the virtual bass latency reduction system and method improve upon virtual bass generation methods that performs harmonic transposition on low frequency components of an audio signal to generate transposed data indicative of harmonics that are expected to be audible during playback, generating a virtual bass signal in response to the transposed data, and generating an enhanced audio signal by combining the virtual bass signal with the (delayed) input audio signal.
- the enhanced audio signal provides an increased perceived level of bass content during playback of the enhanced audio signal by one or more loudspeakers that cannot physically reproduce the low frequency components.
- the harmonic transposition performed by the virtual bass generation method employs combined transposition to generate harmonics using a second-order transposer and at least one higher order transposer (typically, a third-order and a fourth-order, and optionally at least one additional higher order transposer) of each of the low frequency components, such that all of the harmonics are generated in response to frequency-domain values determined by a common time-to-frequency domain transform stage (e.g., by performing phase multiplication or other manipulation of the phase on frequency coefficients resulting from a single time-to-frequency domain transform), followed by a common frequency-to-time domain transform (in practice, the common frequency-to-time domain transform is split up into two smaller transforms in order to adapt to the bandwidths and sampling frequencies of the sub-bands of the CQMF framework).
- a common time-to-frequency domain transform stage e.g., by performing phase multiplication or other manipulation of the phase on frequency coefficients resulting from a single time-to-frequency domain transform
- FIG. 2 is a block diagram of a virtual bass processing system that implements or is used in conjunction with certain latency reduction processes under an embodiment.
- the virtual bass processing system 200 takes as input 201 (input A), a plurality of complex-valued sub-band samples (HQMF samples) from a so-called Hybrid filter bank.
- HQMF samples complex-valued sub-band samples
- a Hybrid filter bank preceding the virtual bass process has separated an original time domain audio input signal into such multiple Hybrid sub-bands 201 (which are described in further detail below), and they may be buffered by input buffers 206.
- the buffered input is then processed by a Nyquist synthesis filter bank 208 that performs the synthesis function in order to reconstitute a single complex-valued QMF (CQMF) domain signal 202 (signal C) indicative of low frequency audio content (e.g., between 0 and 375 Hz).
- the virtual bass system includes a latency saving mechanism by bypassing the Nyquist analysis filter bank stage in the preceding Hybrid filter bank. This allows the system to save the delay associated with the Nyquist analysis bank (e.g., 384 samples) by feeding the CQMF channel 0 signal as input 203 (input B) directly to the virtual bass module.
- one of the two inputs 202 or 203 are chosen by a switch, such as selector 204, and the selected signal comprises a virtual bass input signal 205 (signal D) that is further processed by the transposer 209.
- transposer is generally the combination of a time-to-frequency transform or a filter bank followed by a non-linear stage (performing phase multiplication or phase shifting) followed by the frequency-to-time transform or filter bank.
- transposer 209 comprises a time-to frequency transform component 210, a non-linear stage 212, and a frequency-to-time transform 214.
- the non-linear stage 212 within transposer 209 is a processing block that modifies the phase and applies certain gain (amplitude) control signals to the sub-band or transform components of the signal.
- the transposed signals are then buffered by output buffers 216 and subsequently processed by Nyquist analysis filter banks 218 that perform the analysis function that decomposes the virtual bass output CQMF signals into sub-bands corresponding to the Hybrid sub-band samples (HQMF) of the input signal 201.
- a delayed and unprocessed version of the input A signal 220 is mixed with the Nyquist filter bank 218 output to produce an enhanced audio output signal 222 comprising the virtual bass output signal plus the delayed input signal.
- embodiments may be directed to the use of Nyquist filter banks for certain functions, such as synthesis 208 and analysis 218 stage processing, it should be noted that other types of filter banks or frequency splitting or partitioning circuits and techniques may also be used. In other embodiments, the above mentioned filter banks or frequency splitting or partitioning circuits and techniques, may not be present at all.
- FIGS. 3A-C are more detailed diagrams of the virtual bass processing system illustrated in FIG. 2 .
- FIG. 3A illustrates a pre-processing Hybrid filter bank stage 300, that is, a stage that typically is not part of, but instead precedes the virtual bass system.
- a Hybrid filter bank may be the combination of a CQMF bank, where a certain number of the lowest CQMF bands are processed by Nyquist filter banks of pre-determined sizes in order to increase the frequency resolution of the low frequency range.
- the combination of low frequency sub-band samples from the Nyquist analysis stages and the remaining CQMF channels are referred to as Hybrid sub-band samples, or an HQMF (Hybrid QMF) signal.
- Hybrid sub-band samples or an HQMF (Hybrid QMF) signal.
- a time domain input signal 302 is input to a 64-channel CQMF analysis filter bank 304.
- the CQMF channel 0 (denoted signal B) 306
- the virtual bass module 330 of FIG. 3C (this signal corresponds to input B 203 of FIG. 2 ).
- the signal B 306 bypasses the Nyquist analysis filter bank 307, and hence avoids the associated delay.
- CQMF channels 0, 1, and 2 are also input to a number of Nyquist analysis filter banks 307-309. The output from the Nyquist analysis filter banks and the remaining CQMF sub-bands (3 to 63) produce the Hybrid sub-band samples 0-76 (denoted as signal A) 310.
- a plurality of complex-valued Hybrid sub-band samples (signal A) 322 are input to a Nyquist synthesis filter bank stage 324.
- the virtual bass module 330 of FIG. 3C is assumed to be one module amongst other modules in a system that operates on Hybrid sub-band samples (HQMF samples).
- signal A 310 of FIG. 3A may undergo processing by other modules after the pre-processing filter bank stage 300 before becoming input A 322 of FIG. 3B .
- the first 8 Hybrid sub-bands i.e., the sub-bands from the low frequency, eight-channel (8-ch) Nyquist filter bank 307 (which produce a signal bandwidth of roughly 344-375 Hz depending on the sampling rate) are processed. Since a Nyquist filter bank is not down-sampled in contrast to the CQMF bank, the Nyquist filter bank synthesis step is particularly straightforward since it is just a summation of the sub-band samples for each CQMF (or HQMF) time slot. After summation of the eight lowest Hybrid sub-band samples in stage 324, the system has reconstituted the CQMF channel 0 signal C 326, which becomes input 332 to the virtual bass module 330 of FIG. 3C .
- FIG. 3C illustrates a virtual bass system that implements or is used in conjunction with certain latency reduction processes, under an embodiment.
- the virtual bass module 330 of FIG. 3C has signal D 332 as input.
- signal D 332 may be routed from signal B 306 of FIG. 3A .
- signal D 332 may be fed from signal C 326 of the Nyquist synthesis stage 320 of FIG: 3B .
- signal D 332, i.e., the input signal to the virtual bass module is a single complex-valued CQMF signal (e.g., the first channel (channel 0) from a set of CQMF sub-band signals).
- an optional dynamics processing function may be performed by dynamics processor 336 in order to change the dynamics of the virtual bass input signal.
- the processor 336 may be used to decrease the level of weak bass and maintain or enhance strong bass, i.e., be used as an expander. This scheme is in agreement to the shapes of the Equal Loudness Contours (ELC) in the bass range, where the loudness curves are flatter in frequency for louder signals and steeper for signals of weaker loudness. Weaker bass can hence be attenuated more than stronger bass when generating harmonics in order to maintain the relative loudness between the fundamental component and the generated harmonics.
- the gain of the dynamics processor 336 may be controlled by a running average energy signal, e.g., the running average energy of a down-mixed (mono) version of the first CQMF band signal 332.
- a first windowing function using a window size L (including zero-padding up to length N ) 338, forward FFT 340 and modulation function 342 is performed on the (possibly dynamics processed) CQMF signal prior to input to the non-linear processing block 344.
- the window shape is asymmetric.
- the transposer (comprising components 338 to 356) represents an improved phase vocoder that uses an interpolation technique referred to as "combined transposition" to generate second, third, fourth, and possibly higher order harmonics (transposition factors), using the same FFT analysis/synthesis chain as for the base transposer.
- the non-linear processing block 344 uses integer transposition factors, which makes redundant certain phase estimation, phase unwrapping, or phase locking techniques that are generally unstable and inexact as used in many standard phase vocoders.
- the phase multipliers 344 use a base transposition factor B higher than 2, such as 8, or any other appropriate value.
- the transposer 338-356 uses oversampling in the frequency domain (i.e., zero-padded analysis and synthesis windows in blocks 338 and 356) to improve impulsive (percussive) sounds, which is paramount when used in the bass frequency range. Without such oversampling, percussive drum sounds would likely generate at least some pre- and post-echo artifacts, making the bass blurry and indistinct.
- the transposer includes gain and slope compensation per FFT bin applied by amplifiers 346 following the phase multiplier circuits ( non-linear processing block 344).
- This allows overall gains for different transposition factors to be set independently. For example, gains can be set to approximate certain equal loudness contours (ELC).
- ELC equal loudness contours
- the ELC can be adequately modeled by straight lines on a logarithmic scale for frequencies below 400 Hz.
- odd order harmonics can be attenuated to a greater extent since odd order harmonics (e.g., third, fifth, etc.) can sometimes be perceived as being more harsh than even order harmonics, although being important for the resulting virtual bass effect.
- Each transposed signal may additionally have a slope gain, i.e., a roll-off attenuation factor, measured in e.g., dB per octave. This attenuation is also applied per bin in the transform domain by amplifiers 346.
- a slope gain i.e., a roll-off attenuation factor, measured in e.g., dB per octave. This attenuation is also applied per bin in the transform domain by amplifiers 346.
- the transposer 338-356 In a non-Hybrid filter bank based system, e.g., a time domain system, taking signal 302 of FIG. 3A as input, the transposer 338-356 would directly operate on a time domain signal of full sampling rate (e.g., 44.1 or 48 kHz), and then employ an FFT size of roughly 4096 lines, in order to provide an adequate resolution in the low frequency (bass) range. In an embodiment, all processing, however, is performed on CQMF channel 0 sub-band samples (signal D 332 of system 330). This provides certain advantages over normal processing practices, such as saving computational complexity by processing only the signal of interest in the transposer, i.e., by processing a critically sampled (or maximally decimated) low-pass signal.
- the virtual bass system expands the bandwidth of the input signal by a factor of four.
- a virtual bass system is not required to output a signal with a bandwidth above roughly 500 Hz.
- the system can process the complex-valued samples using an FFT transform of size 64 (4096/64) instead of 4096, where the decrease by 64 comes from the down-sampling factor of the CQMF bank, which also equals the reduced bandwidth of the first CQMF sub-band signal compared to the time domain input signal.
- the output from the transposer needs to be transformed to CQMF bands 0 and 1. This may be done approximately by a split of the 64-line FFT into four 16-line FFTs and subsequently employing CQMF prototype filter response compensation in the transform domain before the inverse FFT of the two 16-line FFTs that constitute CQMF band 0 and 1 are calculated.
- the FFT spectrum may be split in module 348 of the virtual bass module 330 and the CQMF filter response compensation may be done by multipliers 350.
- the CQMF filter response compensation may be done on the full (e.g., 64-lines in the example above) FFT spectrum before the FFT split module 348.
- the output from the CQMF filter response compensation blocks 350 is input to modulation steps 352 followed by inverse FFT circuits 354, using transform sizes of N / B points, and subsequent windowing and overlap/add steps 356, using window lengths L / B .
- the window shapes are asymmetric.
- the modulation steps 352 may also be applied before the FFT split 348 and CQMF filter response compensation 350 blocks.
- the output signals from the windowing and overlap/add circuits 356 are two CQMF signals, containing the virtual bass signal to be mixed with the delayed HQMF signal A 364. However, both signals need first be filtered through 8- and 4-channel Nyquist analysis filter banks 360 respectively to fit in the Hybrid domain.
- the Nyquist analysis filter banks 360 use truncated prototype filters.
- the HQMF output from the filter banks 360 may be band pass filtered and mixed with a delayed input component A 364 in module 362 to produce the enhanced audio output HQMF signal 366.
- the delay of input A 364 to the Hybrid band mix block 362 is less than the virtual bass system delay (minus the Nyquist analysis delay if signal B 306 is used as input) to comprise a time lagged virtual bass signal.
- system 330 employs phase compensation by an exp(-j ⁇ /2) multiplication 358 on the CQMF channel 1 before the Nyquist analysis blocks 360.
- the specific argument to the phase compensation function 358 is dependent on the modulation scheme used by the preceding CQMF bank 304 of FIG. 3A and may differ between embodiments. Also, the compensation factor 358 may be moved and absorbed in other processing blocks.
- the virtual bass processing system introduces certain delays when processing the input signal.
- the total delay of the transposer and the Nyquist filter bank analysis stage can be in the order of 3200 samples, as described previously.
- the virtual bass processing system includes components that perform certain steps to reduce the latency associated with virtual bass processed content.
- FIG. 4 is a block diagram of the principal functional components utilized by a virtual bass latency reduction process and system, under an embodiment.
- the latency reduction process comprises the use of higher order base transposition factors 402, low-latency asymmetric transform windows 404, truncated Nyquist prototype filters 406, and a time lagged virtual bass signal 408.
- Each of the functional components of diagram 400 may be used alone or in conjunction with one or more of the other components to help reduce the latency of the virtual bass processed content.
- Diagram 400 may represent a system, such as when each of the components 402-408 is embodied as hardware component, such as circuits, processors, and so on.
- the diagram may also represent a process, such as when each of the components 402-408 is implemented as an act performed by a functional component, such as a computer-implemented process executed by one or more processors.
- diagram 400 may represent a hybrid system and method wherein certain components may be implemented in hardware circuitry and others may be implemented as performed method steps.
- the components 402-408 may be implemented as separate stand-alone components, or they may be combined in one or more consolidated latency reduction functions. A detailed description of the composition and operation of each component of system 400 follows below.
- FIG. 5A is a table illustrating the delay associated with a first hop size
- FIG. 5B is a table illustrating the delay associated with a second hop size for a virtual bass latency reduction system under an embodiment.
- L 16 to 128
- the transposer source ranges are smaller than the transposer target ranges in the analysis transform spectrum.
- the target bins result from interpolation of the source bins.
- the source ranges will be larger than the target ranges and the target bins result from decimation of source bins.
- the increased order of the base transposition factor has certain implications on the virtual bass process.
- the transposer output inherently covers a frequency range of B CQMF bands (assuming an input of one CQMF band), where only the first two will actually be synthesized, thus saving complexity.
- B 8
- F 4
- the quality of the transposed signals is governed by the base transposition factor and gets reduced for higher order transposition orders, but can be improved by using a decreased analysis hop-size (increased oversampling in the time domain). Moreover, to maintain the quality for percussive sounds (transients), the order of frequency domain oversampling needs to increase for higher base transposition factors. However, the increased oversampling in both time and frequency may add to the computational complexity of the transposer.
- the analysis hop-size is decreased a factor of two compared to the legacy system.
- the latency reduction system uses asymmetric analysis and synthesis windows in the forward and inverse transform stages (e.g., windowing stages 338 and 356 of FIG. 3C , respectively). This essentially improves the frequency response of a symmetric window of limited length by extending the "tail" of the window towards samples in the history not contributing to the transform delay.
- both the length of the analysis window and the size of the forward transform may be different from that of the synthesis window and the inverse transform.
- FIG. 5C is an example plot of a time response of an asymmetric window compared to legacy symmetric Hanning windows.
- FIG. 5C illustrates the time response as a function of samples (x-axis) versus signal amplitude (e.g., in volts) for a Hanning window of length 64 shown as plot 514 and a Hanning window of length 41 shown as plot 516 versus the time response plot 512 for an asymmetric window of length 64 and delay 40 (a delay equal to the Hanning window of length 41).
- FIG. 5D is an example plot of frequency responses of an asymmetric window compared to legacy symmetric Hanning windows.
- 5D illustrates the frequency response as a function of normalized frequency (x-axis) versus signal amplitude on a logarithmic scale (e.g., in dB) for the Hanning window of length 64 shown as plot 524 and the Hanning window of length 41 shown as plot 526 versus the frequency response plot 522 for the asymmetric window of length 64 and delay 40 (equal to the Hanning window of length 41).
- the main lobe of the asymmetric window has a width in between those of the symmetric Hanning windows, indicating a frequency resolution or selectivity in between the two Hanning windows.
- the transposer algorithm need to be partially changed compared to the legacy implementation, taking into account the reduced transform delay D of the analysis/synthesis chain.
- M S n e ⁇ i ⁇ ⁇ / N ⁇ D ⁇ n , 0 ⁇ n ⁇ N
- k and n respectively are the transform frequency coefficient indices
- F is the frequency domain oversampling factor
- L is the analysis window size
- D is the transform delay.
- the modulation of Eq. 5 may also be applied in modulation stages 352 after the FFT split module 348 and response compensation step 350.
- FIG. 6 illustrates stylistically the use of asymmetric windows and the associated delay imposed by a B -order base transposer, under an embodiment.
- Time plot 600 shows the time zero reference as the group delay of the analysis window (approximately D /2). New samples 604 are added from time to in the analysis phase 602.
- Time plot 610 shows that the time stretch duality of the transposer moves t 0 to time B ⁇ t 0 in the synthesis phase 612 for the new time-stretched samples 614.
- the total analysis/synthesis chain delay amounts approximately to: D /2 + B ⁇ ( D /2 - S A ) in the case where asymmetric windows, such as shown in FIG. 5 (512) or FIG. 6 are used.
- the calculations of Eqs. 4 and 5 above may likewise be implemented by circular time shifts of N- ( D / 2 - ( L - 1)) (mod N) samples before the analysis transform and N- D /2 samples after a (single) synthesis transform respectively.
- N- D / 2 - ( L - 1) mod N samples before the analysis transform
- N- D /2 samples after a (single) synthesis transform respectively.
- B 8
- the time shifts after the synthesis transforms will be ( N- D / 2 )/ B samples, which may not be an integer value. In this case, a rounded value may be used as an approximation.
- the analysis modulation may be combined with the synthesis modulation as a merged synthesis modulation as given by Eq. 6:
- M ASC k e ⁇ i 2 ⁇ ⁇ / N ⁇ D / 2 ⁇ B + 1 ⁇ L + 1 ⁇ B ) ⁇ k , 0 ⁇ k ⁇ N
- T the transposition factor
- Eq. 6 will also be an approximation.
- g x ( m ) is the time-domain output from one of the synthesis inverse transforms
- Eq. 7 provides only an approximation of the frequency modulation implemented by Eq. 6 (which in itself may be an approximation) when the argument to the ceil-function ⁇ (rounding up to closest integer) is not an exact integer. It should also be noted that Eqs. 5 or 6 above are preferably applied only to the limited part of the coefficients that will be included in the two inverse Fourier transforms.
- Eq. 8 refers to the delay in output samples using a 64-channel CQMF based framework.
- FIG. 7A is a table illustrating the total latency values for a first hop size
- FIG. 7B is a table illustrating the total latency values for a second hop size for a virtual bass latency reduction system that uses asymmetric transform windows, under an embodiment.
- the amount of asymmetry of the transposition windows may vary depending upon the constraints and requirements of the system.
- the group delay of the asymmetric window is selected to be close to half of the transform delay in order to maintain adequate transposition quality.
- G d ⁇ D /2 20. This may be accomplished by including a constraint for the group delay during an optimization phase for design of the asymmetric filter.
- a third latency reduction element comprises using truncated Nyquist prototype filters, 406.
- 8-channel and 4-channel Nyquist analysis filter banks 360 are applied to the virtual bass output CQMF channels (these filter banks correspond to the Nyquist filter banks 307 and 308 of FIG. 3A ).
- this entire delay (e.g., 384 samples) may be eliminated.
- the Nyquist analysis/synthesis chain still provides perfect reconstruction. However, the frequency responses of the Nyquist filter banks using truncated filters may change. Optimization of the remaining filter coefficients may improve the potentially poorer frequency responses of the Nyquist filter banks using truncated filters.
- a fourth latency reduction element comprises letting the virtual bass signal lag the original signal, 408.
- the latency of the overall system can be reduced as the wide band signal (i.e., the Hybrid signal A 364 of FIG. 3C ) is delayed a shorter period of time than the virtual bass system delay actually implies.
- Informal listening tests have shown that a lag below 20 ms does not hamper the virtual bass effect. This lag corresponds to 960 samples for a 48 kHz audio signal.
- the virtual bass signal is allowed to lag the wide band signal by a total of 352 samples (7.33 ms at 48 kHz).
- 352 samples 32 samples are coming from the use of the asymmetric transform window as 1376 is not evenly divisible by the CQMF filter bank size of 64.
- the delay from the asymmetric window transform can be divided into a wide band latency of 1344 plus a bass lag of 32 samples.
- the extra lag added on top of the 32 samples is thus 320 samples (5 CQMF samples, corresponding to 6.67 ms at 48 kHz sampling frequency).
- the different latency reduction elements 402-408 of FIG. 4 may be used in any practical number of combinations to achieve a reduction in virtual bass system latency. Furthermore, the appropriate variables of each latency reduction method may be altered to increase the latency in relation to any perceived decrease in virtual bass signal quality.
- the delay of 640 samples in this example case is significantly less than the nominal delay of 3200 samples in the legacy virtual bass system described previously. This delay can be reduced even further by adding more virtual bass lag, by increasing the hop-size S A to 4 instead of 2, or by designing an asymmetric transform window with a resulting analysis/synthesis delay shorter than 40. However, the change of any such values may result in slightly poorer virtual bass quality, though the latency may be further reduced.
- FIG. 8 is a block diagram illustrating an audio processing system that includes a virtual bass generation system and a latency reduction system, under an embodiment.
- system 800 comprises a virtual bass system 330 as illustrated in FIG. 3C .
- Virtual bass system 330 receives input audio signals 801 and performs certain frequency transposition functions to produce enhanced audio content for playback through speakers 806 that may be of limited frequency response capability. Certain latencies may be associated with the transposition functions performed by the virtual bass system 330.
- a virtual bass latency reduction system 400 (as illustrated in FIG.
- the reduced latency audio signals from the virtual bass systems 330 and 400 are then sent to a rendering subsystem 802 that is configured to generate speaker feeds that may be fed through amplifier 804 for left and right (or multi-channel) speakers 806.
- the virtual bass latency reduction system 400 is shown to be a separate post-process element in system 800, it should be noted that such a latency reduction system may be implemented as part of the virtual bass system 330 (as indicated earlier), or as part of any other appropriate element of system 800, such as a functional component within rendering subsystem 802.
- the virtual bass system 330 may be a legacy virtual bass generation system as outlined in the background, or it may be any other virtual bass generation and processing system that uses harmonic transposition to enhance input audio signals 801 to increase the perceived level of bass content for playback through speakers 806.
- Embodiments of the virtual bass latency reduction system can be used in any audio processing system that renders and plays back digital audio through a variety of different playback devices and audio speakers (transducers).
- These speakers may be embodied in any of a variety of different listening devices or items of playback equipment, such as computers, televisions, stereo systems (home or cinema), mobile phones, tablets, and other portable playback devices.
- the speakers may be of any appropriate size and power rating, and may be provided in the form of free-standing drivers, speaker enclosures, surround-sound systems, soundbars, headphones, earbuds, and so on.
- the speakers may be configured in any appropriate array, and may include monophonic drivers, binaural speakers, surround-sound speaker arrays, or any other appropriate array of audio drivers.
- aspects of one or more embodiments described herein may be implemented in an audio system that processes audio signals for transmission across a network that includes one or more computers or processing devices executing software instructions. Any of the described embodiments may be used alone or together with one another in any combination. Although various embodiments may have been motivated by various deficiencies with the prior art, which may be discussed or alluded to in one or more places in the specification, the embodiments do not necessarily address any of these deficiencies. In other words, different embodiments may address different deficiencies that may be discussed in the specification. Some embodiments may only partially address some deficiencies or just one deficiency that may be discussed in the specification, and some embodiments may not address any of these deficiencies.
- Portions of the adaptive audio system may include one or more networks that comprise any desired number of individual machines, including one or more routers (not shown) that serve to buffer and route the data transmitted among the computers.
- Such a network may be built on various different network protocols, and may be the Internet, a Wide Area Network (WAN), a Local Area Network (LAN), or any combination thereof.
- One or more of the components, blocks, processes or other functional components may be implemented through a computer program that controls execution of a processor-based computing device of the system. It should also be noted that the various functions disclosed herein may be described using any number of combinations of hardware, firmware, and/or as data and/or instructions embodied in various machine-readable or computer-readable media, in terms of their behavioral, register transfer, logic component, and/or other characteristics.
- Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, physical (non-transitory), non-volatile storage media in various forms, such as optical, magnetic or semiconductor storage media.
Landscapes
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Stereophonic System (AREA)
- Circuit For Audible Band Transducer (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Auxiliary Devices For Music (AREA)
Description
- This application claims priority to United States Provisional Patent Application No.
13/652,023 filed 15 October 2012 - One or more embodiments relate generally to transform-based audio signal processing, and more specifically to reducing latency in transposer-based virtual bass synthesis systems.
- Bass synthesis refers to methods of adding components to the low frequency range of a signal in order to enhance the perceived bass. Of these methods, a sub-bass synthesis technique creates low frequency components below the existing partials of a signal in order to extend and improve the lowest frequency range present in the subject audio content. Another method uses virtual pitch algorithms that generate audible harmonics from an inaudible bass range (e.g., low pitched bass played through small loudspeakers), hence making the harmonics, and ultimately also the pitch, audible in order to improve the bass response.
- Virtual bass synthesis is a virtual pitch method that increases the perceived level of bass content in audio when played on small loudspeakers that cannot physically reproduce the low-end bass frequencies. The method is based on the 'missing fundamental' psycho-acoustic observation that low pitches can be inferred by the human auditory system from upper harmonics even when the fundamental and the first harmonics themselves are missing. The basic method of functionality is to analyze the bass frequencies present in the audio and generate audible upper harmonics that aid the perception of the missing lower frequencies. A main feature of virtual bass is that it enhances the perceived bass response on devices with small speakers by synthesizing upper harmonics for frequencies below the low-frequency roll-off of the device (e.g., below 150 Hz). Inaudible signal components are transposed to higher audible frequencies using plural transposition factors (harmonics), followed by energy adjustment. Virtual bass synthesis may also increase the perceived bass for headphone playback or playback on full-range loudspeakers.
FIG. 1A shows the frequency-amplitude spectrum of an audio signal having aninaudible range 10 of frequency components, and an audible range of frequency components above the inaudible range. Harmonic transposition of frequency components in theinaudible range 10 can generate transposed frequency components inportion 11 of the audible range, which can enhance the perceived level of bass content of the audio signal during playback. Such harmonic transposition may include application of multiple transposition factors to each relevant frequency component of the input audio signal to generate multiple harmonics of the component. - In certain audio processing systems that utilize legacy virtual bass systems, the delay or latency associated with the frequency transposition function can be excessive for certain applications. For example, a digital audio processing system that has a latency of 1025 samples may use a legacy virtual bass system that adds an additional 3200 samples of delay. This can cause a total delay to exceed 88 milliseconds, given a sampling frequency (fs ) of 48kHz. This amount of latency is generally problematic and even prohibitive for gaming and telecommunications applications, where a latency of about 100 milliseconds starts to become noticeable in terms of audible signal delay.
- Traditional transposer systems as the transposer system shown in document
US 2012 (0008788) used in legacy virtual bass systems use symmetric time domain windows for the analysis and synthesis stages of the time-to-frequency and frequency-to-time transforms respectively.FIG. 1B illustrates the delay associated with symmetric windows used in legacy virtual bass systems, as known in the prior art.FIG. 1B graphically illustrates the delay imposed by a second-order transposer, i.e., a transposer that generates 2nd order harmonics. As shown intime plot 100, the center of one of the stylistic symmetric analysis window is chosen as the time zero reference, andnew input samples 104 can be added from time t0 in theanalysis phase 102, assuming a time stride SA of the analysis windows.Time plot 110 shows the time stretch duality of the transposer, where to is stretched to 2·t0 in thesynthesis phase 112. -
- In a HQMF (Hybrid Quadrature Mirror Filter) bank based audio processing system, the input signal to the CQMF (Complex Quadrature Mirror Filter) analysis stage and the output signal from the CQMF synthesis stage generally both have the same sampling frequency fs , where fs is usually set to 44.1 or 48 kHz. The input signal sampling rate to the virtual bass process may be fs /64 since the system is usually processing the first CQMF signal only from a 64-channel CQMF bank. It should be noted that CQMF sizes other than 64 channels could also be used. The transposed output from the legacy virtual bass processing system has a sampling frequency of 2·fs /64 because of the combined transposition function using a factor two base transposition factor, resulting in a factor two bandwidth expansion. In a combined transposer, the base transposition factor is the factor where the source transform bins (or frequency bands) are mapped in a one-to-one relationship to the target transform bins (or frequency bands), i.e., there is no interpolation or decimation involved in the source to target bin mapping. The base transposition factor also governs the relation between the time strides of the analysis and synthesis windows. More specifically, the synthesis time stride equals the analysis time stride multiplied by the base transposition factor. The delay in output samples from a 64-channel CQMF based system for a case in which L = 64 and SA = 4, becomes:
- In addition to this delay, a delay from the Nyquist filter bank analysis stage processing of the two virtual bass output CQMF sub-band signals is added. This delay may be on the order of 384 samples, thus giving a total delay of 2816 + 384 = 3200 samples for this example prior art legacy virtual bass processing system.
- One solution to the latency imposed by legacy virtual bass systems is to change the actual processing circuitry, such as the harmonic generator, such as by replacing the harmonic transposer with alternative components. However, this potentially adds a great deal of cost and complexity to the system and may also negatively impact the audio quality.
- The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions.
- Embodiments include a latency reduction system in a virtual bass processing system that performs harmonic transposition on low frequency components of an audio signal to generate transposed data indicative of harmonics. The harmonic transposition process uses a base transposition factor greater than two, and generates the harmonics in response to frequency-domain values determined by transform and inverse transform stages that use asymmetric analysis and synthesis windows. An enhanced audio signal is generated by combining a virtual bass signal with the delayed audio signal through the use of Nyquist analysis filter banks that comprise truncated prototype filters. The virtual bass signal may be allowed to lag the delayed audio signal by a defined time period when combining with the audio signal to further reduce the latency caused by the harmonic transposition process.
- Embodiments include a method of reducing latency in a virtual bass generation system by performing harmonic transposition on low frequency components of an input audio signal to generate transposed data indicative of harmonics, wherein the harmonic transposition uses a base transposition factor of an integer value greater than two. It generates the harmonics in response to frequency-domain values determined by a time-to-frequency domain transform stage and a subsequent inverse frequency-to-time domain transform stage through the use of asymmetric analysis and synthesis windows for the time-to-frequency domain transform and inverse frequency-to-time domain transforms. The input audio signal is a sub-banded CQMF (complex-valued quadrature mirror filter) signal and samples of the input audio signal may be pre-processed to generate critically sampled audio indicative of the low frequency components.
- In an embodiment, the method processes the input audio signal through an analysis filter bank or transform to provide a set of analysis sub-band signals or frequency bins from the low frequency components, computes a set of synthesis sub-band signals or frequency bins using the base transposition factor B and transposition factor T, and processes the analysis sub-band signals or frequency bins through a synthesis filter bank or transform to generate a high frequency component from the set of synthesis sub-band signals. This represents a standard way of doing transposition, i.e., performing forward FFT transforms followed by non-linear processing including transform bin mapping, and then performing inverse FFT transforms. The method may further include generating a virtual bass signal in response to the transposed data, and generating an enhanced audio signal by combining the virtual bass signal with the input audio signal by applying one or two analysis filter banks to the virtual bass audio output signal, wherein the analysis filter banks comprise truncated prototype filters that have a defined number of filter coefficients removed. The method may yet further include a lag of the virtual bass signal by a pre-defined time period relative to the input audio signal, by combining the virtual bass signal with the input audio signal delayed a pre-defined time period shorter than the processing delay of the virtual bass system would imply, to generate an enhanced audio signal comprising time lagged virtual bass processed sub-band samples combined with delayed input sub-band samples.
- The base transposition factor under some embodiments extends the input audio signal in the frequency domain to a degree proportionate to the value of the base transposition factor to produce a transposed audio signal, and this base transposition factor may be an even integer value between 4 and 16. In an embodiment, the analysis filter banks operating on the transposer CQMF output sub bands comprise an eight-channel Nyquist filter bank and a four-channel Nyquist filter bank, and the defined number of removed prototype filter coefficients comprises six coefficients. In a further embodiment, the input CQMF signal is routed directly from a preceding CQMF
analysis bank channel 0 output, hence bypassing a subsequent Nyquist filter bank stage and so avoiding the related delay. - Embodiments of the method may further include generating the low frequency components by performing a frequency domain oversampled transform on the input audio signal by generating windowed and zero-padded samples at a defined sample frequency (using the analysis time stride). The pre-defined time period when combining the virtual bass signal with the delayed input audio signal may be a value selected from the range of 0 samples to 1000 samples, since the virtual bass signal may be allowed to lag the wide band input audio signal up to 20 ms without noticeable degradation of the enhanced audio signal. In an embodiment, the asymmetric analysis and synthesis windows are configured such that a longer portion of the analysis windows are stretched toward past input samples, and that a longer portion of the synthesis windows are stretched toward future output samples.
- Embodiments are also directed to systems or apparatus elements configured to implement at least some of the methods described above.
- In the following drawings like reference numbers are used to refer to like elements. Although the following figures depict various examples, the one or more implementations are not limited to the examples depicted in the figures.
-
FIG. 1A illustrates the transposition of frequency components from an inaudible frequency range to an audible frequency range in a known virtual bass processing system. -
FIG. 1B illustrates the delay associated with symmetric windows used in legacy virtual bass systems, as known in the prior art. -
FIG. 2 is a generalized block diagram of a virtual bass processing system that implements latency reduction processes under an embodiment. -
FIG. 3A illustrates a pre-processing Hybrid filter bank stage in a HQMF based system under an embodiment. -
FIG. 3B illustrates a preceding Nyquist synthesis filter bank stage of a virtual bass processing system under an embodiment. -
FIG. 3C is a more detailed diagram of the virtual bass processing system illustrated inFIG. 2 , under an embodiment. -
FIG. 4 is a block diagram of the principal functional components utilized by a virtual bass latency reduction process and system, under an embodiment. -
FIG. 5A is a table illustrating the delay associated with a first hop size for a virtual bass latency reduction system using different orders of the base transposition factor, under an embodiment. -
FIG. 5B is a table illustrating the delay associated with a second hop size for a virtual bass latency reduction system using different orders of the base transposition factor, under an embodiment. -
FIG. 5C is an example plot of time responses of an asymmetric window compared to certain legacy symmetric windows, andFIG. 5D is an example plot of frequency responses of an asymmetric window compared to certain legacy symmetric windows. -
FIG. 6 illustrates the use of asymmetric windows and the associated delay imposed by a B-order base transposer, under an embodiment. -
FIG. 7A is a table illustrating the total latency values for a first hop size for a virtual bass latency reduction system that uses asymmetric transform windows and different orders of the base transposition factor, under an embodiment. -
FIG. 7B is a table illustrating the total latency values for a second hop size for a virtual bass latency reduction system that uses asymmetric transform windows and different orders of the base transposition factor, under an embodiment. -
FIG. 8 is a block diagram illustrating an audio processing system that includes a virtual bass generation system and latency reduction system, under an embodiment. - Embodiments of systems and methods are described for reducing latency and algorithmic delays in transposer-based virtual bass systems. Such systems and methods utilize higher-order base transposition factors, low latency asymmetric transform windows, truncated Nyquist prototype filters, a time lagged virtual bass signal in respect to the original audio signal, and a bypassed Nyquist analysis filter bank in a preceding Hybrid filter bank stage.
- Throughout this disclosure, including in the claims, the expression performing an operation "on" a signal or data (e.g., filtering, scaling, transforming, or applying gain to, the signal or data) is used in a broad sense to denote performing the operation directly on the signal or data, or on a processed version of the signal or data (e.g., on a version of the signal that has undergone preliminary filtering or pre-processing prior to performance of the operation thereon). The expression "transposer" is used in a broad sense to denote an algorithmic unit or device that performs pitch-shifting or time-stretching of a real or complex-valued input signal, for parts of, or the entire available input signal spectrum. The expressions "transposer", "harmonic transposer", "phase vocoder", "high frequency generator" or "harmonic generator" may be used interchangeably. The expression "system" is used in a broad sense to denote a device, system, or subsystem. For example, a subsystem that implements a decoder may be referred to as a decoder system, and a system including such a subsystem (e.g., a system that generates X output signals in response to multiple inputs, in which the subsystem generates M of the inputs and the other X - M inputs are received from an external source) may also be referred to as a decoder system. The term "processor" is used in a broad sense to denote a system or device programmable or otherwise configurable (e.g., with software or firmware) to perform operations on data (e.g., audio, or video or other image data). Examples of processors include a field-programmable gate array (or other configurable integrated circuit or chip set), a digital signal processor programmed and/or otherwise configured to perform pipelined processing on audio or other sound data, a programmable general purpose processor or computer, and a programmable microprocessor chip or chip set. The expressions "audio processor" and "audio processing unit" are used interchangeably, and in a broad sense, to denote a system configured to process audio data. Examples of audio processing units include, but are not limited to encoders (e.g., transcoders), decoders, vocoders, codecs, pre-processing systems, post-processing systems, and bitstream processing systems (sometimes referred to as bitstream processing tools).
- Embodiments are directed to systems and methods of decreasing virtual bass delay without requiring substantial changes to existing virtual bass processing components, such as the harmonic transposer used in a virtual bass processing system. Aspects of the virtual bass latency reduction system and method may be used in conjunction with a harmonic generator (transposer) in audio codecs (e.g., in a decoder). Aspects of the virtual bass latency reduction system and method may also be used in conjunction with other transposer or phase vocoder systems, e.g., traditional phase vocoders used for general time-stretching or pitch-shifting of audio signals.
- As shown generally in
FIG. 1A , virtual bass generation methods using harmonic transposition involve the transposition of frequency components from an inaudible frequency range to an audible frequency range in order to improve playback of bass content in limited playback equipment, such as through small speakers that cannot physically reproduce the missing lower frequencies. Embodiments of the virtual bass latency reduction system and method improve upon virtual bass generation methods that performs harmonic transposition on low frequency components of an audio signal to generate transposed data indicative of harmonics that are expected to be audible during playback, generating a virtual bass signal in response to the transposed data, and generating an enhanced audio signal by combining the virtual bass signal with the (delayed) input audio signal. Typically, the enhanced audio signal provides an increased perceived level of bass content during playback of the enhanced audio signal by one or more loudspeakers that cannot physically reproduce the low frequency components. - The harmonic transposition performed by the virtual bass generation method employs combined transposition to generate harmonics using a second-order transposer and at least one higher order transposer (typically, a third-order and a fourth-order, and optionally at least one additional higher order transposer) of each of the low frequency components, such that all of the harmonics are generated in response to frequency-domain values determined by a common time-to-frequency domain transform stage (e.g., by performing phase multiplication or other manipulation of the phase on frequency coefficients resulting from a single time-to-frequency domain transform), followed by a common frequency-to-time domain transform (in practice, the common frequency-to-time domain transform is split up into two smaller transforms in order to adapt to the bandwidths and sampling frequencies of the sub-bands of the CQMF framework).
-
FIG. 2 is a block diagram of a virtual bass processing system that implements or is used in conjunction with certain latency reduction processes under an embodiment. In an embodiment, the virtualbass processing system 200 takes as input 201 (input A), a plurality of complex-valued sub-band samples (HQMF samples) from a so-called Hybrid filter bank. In an embodiment, a Hybrid filter bank preceding the virtual bass process has separated an original time domain audio input signal into such multiple Hybrid sub-bands 201 (which are described in further detail below), and they may be buffered by input buffers 206. The buffered input is then processed by a Nyquistsynthesis filter bank 208 that performs the synthesis function in order to reconstitute a single complex-valued QMF (CQMF) domain signal 202 (signal C) indicative of low frequency audio content (e.g., between 0 and 375 Hz). In another embodiment, the virtual bass system includes a latency saving mechanism by bypassing the Nyquist analysis filter bank stage in the preceding Hybrid filter bank. This allows the system to save the delay associated with the Nyquist analysis bank (e.g., 384 samples) by feeding theCQMF channel 0 signal as input 203 (input B) directly to the virtual bass module. As shown inFIG. 2 , one of the twoinputs selector 204, and the selected signal comprises a virtual bass input signal 205 (signal D) that is further processed by thetransposer 209. - A transposer (or phase vocoder) is generally the combination of a time-to-frequency transform or a filter bank followed by a non-linear stage (performing phase multiplication or phase shifting) followed by the frequency-to-time transform or filter bank. Thus, as shown in
FIG. 2 ,transposer 209 comprises a time-tofrequency transform component 210, anon-linear stage 212, and a frequency-to-time transform 214. Thenon-linear stage 212 withintransposer 209 is a processing block that modifies the phase and applies certain gain (amplitude) control signals to the sub-band or transform components of the signal. The transposed signals are then buffered byoutput buffers 216 and subsequently processed by Nyquistanalysis filter banks 218 that perform the analysis function that decomposes the virtual bass output CQMF signals into sub-bands corresponding to the Hybrid sub-band samples (HQMF) of theinput signal 201. A delayed and unprocessed version of theinput A signal 220 is mixed with theNyquist filter bank 218 output to produce an enhancedaudio output signal 222 comprising the virtual bass output signal plus the delayed input signal. - Although embodiments may be directed to the use of Nyquist filter banks for certain functions, such as
synthesis 208 andanalysis 218 stage processing, it should be noted that other types of filter banks or frequency splitting or partitioning circuits and techniques may also be used. In other embodiments, the above mentioned filter banks or frequency splitting or partitioning circuits and techniques, may not be present at all. -
FIGS. 3A-C are more detailed diagrams of the virtual bass processing system illustrated inFIG. 2 .FIG. 3A illustrates a pre-processing Hybridfilter bank stage 300, that is, a stage that typically is not part of, but instead precedes the virtual bass system. A Hybrid filter bank may be the combination of a CQMF bank, where a certain number of the lowest CQMF bands are processed by Nyquist filter banks of pre-determined sizes in order to increase the frequency resolution of the low frequency range. The combination of low frequency sub-band samples from the Nyquist analysis stages and the remaining CQMF channels are referred to as Hybrid sub-band samples, or an HQMF (Hybrid QMF) signal. As shown inFIG. 3A , a timedomain input signal 302 is input to a 64-channel CQMFanalysis filter bank 304. In an embodiment, one output of this filter bank, the CQMF channel 0 (denoted signal B) 306, is fed directly to thevirtual bass module 330 ofFIG. 3C (this signal corresponds to inputB 203 ofFIG. 2 ). It should be noted that thesignal B 306 bypasses the Nyquistanalysis filter bank 307, and hence avoids the associated delay.CQMF channels - As shown in
system 320 ofFIG. 3B , a plurality of complex-valued Hybrid sub-band samples (signal A) 322 are input to a Nyquist synthesisfilter bank stage 324. Thevirtual bass module 330 ofFIG. 3C is assumed to be one module amongst other modules in a system that operates on Hybrid sub-band samples (HQMF samples). Hence, signal A 310 ofFIG. 3A may undergo processing by other modules after the pre-processingfilter bank stage 300 before becominginput A 322 ofFIG. 3B . In an example embodiment, the first 8 Hybrid sub-bands, i.e., the sub-bands from the low frequency, eight-channel (8-ch) Nyquist filter bank 307 (which produce a signal bandwidth of roughly 344-375 Hz depending on the sampling rate) are processed. Since a Nyquist filter bank is not down-sampled in contrast to the CQMF bank, the Nyquist filter bank synthesis step is particularly straightforward since it is just a summation of the sub-band samples for each CQMF (or HQMF) time slot. After summation of the eight lowest Hybrid sub-band samples instage 324, the system has reconstituted theCQMF channel 0signal C 326, which becomesinput 332 to thevirtual bass module 330 ofFIG. 3C . -
FIG. 3C illustrates a virtual bass system that implements or is used in conjunction with certain latency reduction processes, under an embodiment. Thevirtual bass module 330 ofFIG. 3C hassignal D 332 as input. In an embodiment where the preceding Nyquistanalysis filter bank 307 is bypassed,signal D 332 may be routed fromsignal B 306 ofFIG. 3A . In another embodiment,signal D 332 may be fed fromsignal C 326 of theNyquist synthesis stage 320 ofFIG: 3B . In both embodiments,signal D 332, i.e., the input signal to the virtual bass module, is a single complex-valued CQMF signal (e.g., the first channel (channel 0) from a set of CQMF sub-band signals). - In a virtual bass application, an optional dynamics processing function may be performed by
dynamics processor 336 in order to change the dynamics of the virtual bass input signal. Theprocessor 336 may be used to decrease the level of weak bass and maintain or enhance strong bass, i.e., be used as an expander. This scheme is in agreement to the shapes of the Equal Loudness Contours (ELC) in the bass range, where the loudness curves are flatter in frequency for louder signals and steeper for signals of weaker loudness. Weaker bass can hence be attenuated more than stronger bass when generating harmonics in order to maintain the relative loudness between the fundamental component and the generated harmonics. The gain of thedynamics processor 336 may be controlled by a running average energy signal, e.g., the running average energy of a down-mixed (mono) version of the firstCQMF band signal 332. - For the embodiment of
system 330, a first windowing function using a window size L (including zero-padding up to length N) 338,forward FFT 340 andmodulation function 342 is performed on the (possibly dynamics processed) CQMF signal prior to input to thenon-linear processing block 344. In an embodiment of the invention, the window shape is asymmetric. In another embodiment, the transposer (comprisingcomponents 338 to 356) represents an improved phase vocoder that uses an interpolation technique referred to as "combined transposition" to generate second, third, fourth, and possibly higher order harmonics (transposition factors), using the same FFT analysis/synthesis chain as for the base transposer. In general, such combined transposition saves computational complexity, though the quality of the other harmonics than the base order harmonics may be somewhat compromised. Without combined transposition, at least either the forward or the inverse transforms need to be separate for the different transposition factors. Thenon-linear processing block 344 uses integer transposition factors, which makes redundant certain phase estimation, phase unwrapping, or phase locking techniques that are generally unstable and inexact as used in many standard phase vocoders. In one embodiment, thephase multipliers 344 use a base transposition factor B higher than 2, such as 8, or any other appropriate value. - The transposer 338-356 uses oversampling in the frequency domain (i.e., zero-padded analysis and synthesis windows in
blocks 338 and 356) to improve impulsive (percussive) sounds, which is paramount when used in the bass frequency range. Without such oversampling, percussive drum sounds would likely generate at least some pre- and post-echo artifacts, making the bass blurry and indistinct. In an embodiment, the oversampling factor F is selected to be at least a factor F = (B+1)l2, where B is the base transposition factor (e.g., B = 8). This helps to ensure that pre- and post- echoes are suppressed for isolated transient sounds. - As shown in
FIG. 3C , the transposer includes gain and slope compensation per FFT bin applied byamplifiers 346 following the phase multiplier circuits ( non-linear processing block 344). This allows overall gains for different transposition factors to be set independently. For example, gains can be set to approximate certain equal loudness contours (ELC). As an approximation, the ELC can be adequately modeled by straight lines on a logarithmic scale for frequencies below 400 Hz. In this case, odd order harmonics can be attenuated to a greater extent since odd order harmonics (e.g., third, fifth, etc.) can sometimes be perceived as being more harsh than even order harmonics, although being important for the resulting virtual bass effect. Each transposed signal may additionally have a slope gain, i.e., a roll-off attenuation factor, measured in e.g., dB per octave. This attenuation is also applied per bin in the transform domain byamplifiers 346. - In a non-Hybrid filter bank based system, e.g., a time domain system, taking
signal 302 ofFIG. 3A as input, the transposer 338-356 would directly operate on a time domain signal of full sampling rate (e.g., 44.1 or 48 kHz), and then employ an FFT size of roughly 4096 lines, in order to provide an adequate resolution in the low frequency (bass) range. In an embodiment, all processing, however, is performed onCQMF channel 0 sub-band samples (signalD 332 of system 330). This provides certain advantages over normal processing practices, such as saving computational complexity by processing only the signal of interest in the transposer, i.e., by processing a critically sampled (or maximally decimated) low-pass signal. For example, by using a fourth-order base transposer, the virtual bass system expands the bandwidth of the input signal by a factor of four. In general, a virtual bass system is not required to output a signal with a bandwidth above roughly 500 Hz. This means that the first CQMF channel (channel 0) having a bandwidth of 375 Hz (for fs = 48 kHz) is more than adequate for the virtual bass input, and the first two CQMF channels (channels 0 and 1) have enough bandwidth (750 Hz at fs = 48 kHz) for the virtual bass output. HavingCQMF channel 0 as input, the system can process the complex-valued samples using an FFT transform of size 64 (4096/64) instead of 4096, where the decrease by 64 comes from the down-sampling factor of the CQMF bank, which also equals the reduced bandwidth of the first CQMF sub-band signal compared to the time domain input signal. Because of the inherent bandwidth expansion, the output from the transposer needs to be transformed toCQMF bands CQMF band module 348 of thevirtual bass module 330 and the CQMF filter response compensation may be done bymultipliers 350. In other embodiments, the CQMF filter response compensation may be done on the full (e.g., 64-lines in the example above) FFT spectrum before the FFT splitmodule 348. - As further shown in
FIG. 3C , the output from the CQMF filter response compensation blocks 350 is input tomodulation steps 352 followed byinverse FFT circuits 354, using transform sizes of N/B points, and subsequent windowing and overlap/addsteps 356, using window lengths L/B. In an embodiment of the invention, the window shapes are asymmetric. The modulation steps 352 may also be applied before the FFT split 348 and CQMFfilter response compensation 350 blocks. The output signals from the windowing and overlap/addcircuits 356 are two CQMF signals, containing the virtual bass signal to be mixed with the delayedHQMF signal A 364. However, both signals need first be filtered through 8- and 4-channel Nyquistanalysis filter banks 360 respectively to fit in the Hybrid domain. In an embodiment of the invention, the Nyquistanalysis filter banks 360 use truncated prototype filters. The HQMF output from thefilter banks 360 may be band pass filtered and mixed with a delayedinput component A 364 inmodule 362 to produce the enhanced audiooutput HQMF signal 366. In an embodiment, the delay ofinput A 364 to the Hybridband mix block 362 is less than the virtual bass system delay (minus the Nyquist analysis delay ifsignal B 306 is used as input) to comprise a time lagged virtual bass signal. - The phase relations between the sub-band signals coming from a CQMF analysis bank will not be maintained when performing the FFT split as outlined above. To alleviate this in an embodiment,
system 330 employs phase compensation by an exp(-j·π/2)multiplication 358 on theCQMF channel 1 before the Nyquist analysis blocks 360. The specific argument to thephase compensation function 358 is dependent on the modulation scheme used by the precedingCQMF bank 304 ofFIG. 3A and may differ between embodiments. Also, thecompensation factor 358 may be moved and absorbed in other processing blocks. - As described in the background section, the virtual bass processing system introduces certain delays when processing the input signal. With reference to
FIG. 1B , the delay (measured on the transposer output sampling frequency) of the legacy transposer can be expressed as D = 3·Ll2 - 2·SA, where L is the transposer window size and SA is the analysis stride or hop-size. In a system in which L = 64 and SA = 4, the total delay of the transposer and the Nyquist filter bank analysis stage can be in the order of 3200 samples, as described previously. - In an embodiment, the virtual bass processing system includes components that perform certain steps to reduce the latency associated with virtual bass processed content.
FIG. 4 is a block diagram of the principal functional components utilized by a virtual bass latency reduction process and system, under an embodiment. As shown in diagram 400 ofFIG. 4 , the latency reduction process comprises the use of higher order base transposition factors 402, low-latency asymmetric transformwindows 404, truncated Nyquist prototype filters 406, and a time laggedvirtual bass signal 408. Each of the functional components of diagram 400 may be used alone or in conjunction with one or more of the other components to help reduce the latency of the virtual bass processed content. Diagram 400 may represent a system, such as when each of the components 402-408 is embodied as hardware component, such as circuits, processors, and so on. The diagram may also represent a process, such as when each of the components 402-408 is implemented as an act performed by a functional component, such as a computer-implemented process executed by one or more processors. Alternatively, diagram 400 may represent a hybrid system and method wherein certain components may be implemented in hardware circuitry and others may be implemented as performed method steps. The components 402-408 may be implemented as separate stand-alone components, or they may be combined in one or more consolidated latency reduction functions. A detailed description of the composition and operation of each component ofsystem 400 follows below. -
- In Eq. 3, the
base transposition factor 2 of the legacy system is replaced by the arbitrary integer base transposition factor B. Note that Eq. 3 refers to the delay in output samples of a CQMF based framework having 64 channels. It can be verified that for constant L and SA, the delay is decreased for increasing B.FIG. 5A is a table illustrating the delay associated with a first hop size, andFIG. 5B is a table illustrating the delay associated with a second hop size for a virtual bass latency reduction system under an embodiment. Table 1 ofFIG. 5A illustrates the latency for a hop size of SA = 4, for various window sizes (L = 16 to 128) and base transposition factors (B = 2 to 16). In comparison, Table 2 ofFIG. 5B illustrates the latency for a hop size of SA = 2, for the same various window sizes (L = 16 to 128) and base transposition factors (B = 2 to 16). As can be seen inFIGS. 5A and 5B , by increasing the base transposition factor from 2 to 8, for example, a significant latency reduction can be achieved (e.g., from 2816 to 2048 samples for the nominal case where L = 64 and SA = 4). - With reference to
FIG. 3C , in the combined transposer 338-356, when generating higher order transposition factors T, where T is greater than B (T > B), the transposer source ranges are smaller than the transposer target ranges in the analysis transform spectrum. The target bins result from interpolation of the source bins. When generating lower order transposition factors using a higher order base transposer, i.e., when T is less than B (T < B), the source ranges will be larger than the target ranges and the target bins result from decimation of source bins. However, also for the case T < B, when T is odd, the source bin index derived as k = n·B/T, where n is the target bin index, will generally not be an integer and hence the target bin will be derived from interpolation of two consecutive source bins. - The increased order of the base transposition factor has certain implications on the virtual bass process. First, control needs to be established to enforce the transposer source range to stay within the analysis transform range (i.e., within the
range 0 to N-1). Second, comparing with a system using a base transposition factor of 2, the two synthesis transforms 354 will now be of size N/B instead of N/2, where N is the analysis transform size. This means that the synthesis window will be decimated by a factor of B instead of 2, and the spectrum splitting 348 along with the gain-vectors forfilter response compensation 350 will also be downscaled accordingly. This is a consequence of the increased bandwidth expansion for higher values of B; the transposer output inherently covers a frequency range of B CQMF bands (assuming an input of one CQMF band), where only the first two will actually be synthesized, thus saving complexity. For a base transposition factor B = 8 and a frequency domain oversampling factor F = 4, the two synthesis transform sizes are Ns = F·L/B= 4·64/8 = 32, and the synthesis transformwindows 356 have only LlB = 64/8 = 8 taps. - The quality of the transposed signals is governed by the base transposition factor and gets reduced for higher order transposition orders, but can be improved by using a decreased analysis hop-size (increased oversampling in the time domain). Moreover, to maintain the quality for percussive sounds (transients), the order of frequency domain oversampling needs to increase for higher base transposition factors. However, the increased oversampling in both time and frequency may add to the computational complexity of the transposer. In an embodiment, the analysis hop-size is decreased a factor of two compared to the legacy system. A base transposer of factor B = 8 will require a frequency domain oversampling factor of at least F= (B+1)/2 = 4.5. In an embodiment, the system uses a factor four oversampling (F = 4) and the missing value of 0.5 is generally insignificant in practice as the transform windows are tapered in the ends. Hence, in this embodiment, the computational complexity is increased by a factor of two in total coming from the increased oversampling in time. It should be noted that the increased time oversampling also comes at a price of slightly increased delay, ending up with a total latency of 2176 samples for L = 64, B = 8 and SA = 2, as shown in Table 2 of
FIG. 5B . - Given what is shown in Tables 1 and 2 of
FIGS. 5A and 5B , it may be presumed that an obvious way of decreasing the transposer delay is to use shorter transform windows, and hence smaller analysis and synthesis transform sizes. However, this generally comes at a cost of reduced quality for dense tonal signals, because of the decreased frequency resolution resulting from the shorter transform windows. It has been found that a more robust decrease of the algorithmic delay of the transposer can be achieved by using asymmetric analysis and synthesis windows in the forward and inverse transforms stages. Thus, with regard to the low latency asymmetric transform 404 ofFIG. 4 , in an embodiment, the latency reduction system uses asymmetric analysis and synthesis windows in the forward and inverse transform stages (e.g., windowing stages 338 and 356 ofFIG. 3C , respectively). This essentially improves the frequency response of a symmetric window of limited length by extending the "tail" of the window towards samples in the history not contributing to the transform delay. In an even more general embodiment, both the length of the analysis window and the size of the forward transform may be different from that of the synthesis window and the inverse transform. -
FIG. 5C is an example plot of a time response of an asymmetric window compared to legacy symmetric Hanning windows.FIG. 5C illustrates the time response as a function of samples (x-axis) versus signal amplitude (e.g., in volts) for a Hanning window oflength 64 shown asplot 514 and a Hanning window oflength 41 shown asplot 516 versus thetime response plot 512 for an asymmetric window oflength 64 and delay 40 (a delay equal to the Hanning window of length 41).FIG. 5D is an example plot of frequency responses of an asymmetric window compared to legacy symmetric Hanning windows.FIG. 5D illustrates the frequency response as a function of normalized frequency (x-axis) versus signal amplitude on a logarithmic scale (e.g., in dB) for the Hanning window oflength 64 shown asplot 524 and the Hanning window oflength 41 shown asplot 526 versus thefrequency response plot 522 for the asymmetric window oflength 64 and delay 40 (equal to the Hanning window of length 41). As can be seen inFIG. 5D , the main lobe of the asymmetric window has a width in between those of the symmetric Hanning windows, indicating a frequency resolution or selectivity in between the two Hanning windows. - To accommodate for asymmetric window transform processing, the transposer algorithm need to be partially changed compared to the legacy implementation, taking into account the reduced transform delay D of the analysis/synthesis chain. Instead of the frequency modulation by e -iπk following the forward transform and preceding the inverse transform of the legacy system, the asymmetric system requires a
frequency modulation 342 after the analysis transform of: -
- In Eqs. 4 and 5 above, k and n respectively are the transform frequency coefficient indices, N is the analysis transform size, i.e., N = FL, where F is the frequency domain oversampling factor, L is the analysis window size and D is the transform delay. As indicated in
FIG. 3C , the modulation of Eq. 5 may also be applied in modulation stages 352 after the FFT splitmodule 348 andresponse compensation step 350. -
FIG. 6 illustrates stylistically the use of asymmetric windows and the associated delay imposed by a B-order base transposer, under an embodiment. In a legacy virtual bass system, B is usually set to two, but if theasymmetric window process 404 is used in conjunction with the higher-order basetransposition factor process 402, then B will be an integer value of greater than two (e.g., B = 4, 8 or 16).Time plot 600 shows the time zero reference as the group delay of the analysis window (approximately D/2).New samples 604 are added from time to in theanalysis phase 602.Time plot 610 shows that the time stretch duality of the transposer moves t0 to time B·t0 in thesynthesis phase 612 for the new time-stretchedsamples 614. The total analysis/synthesis chain delay amounts approximately to: D/2 + B·(D/2 - SA ) in the case where asymmetric windows, such as shown inFIG. 5 (512) orFIG. 6 are used. - As for the symmetric window case, where the frequency domain modulations may be implemented by circular time shifts by N/2 samples, the calculations of Eqs. 4 and 5 above may likewise be implemented by circular time shifts of N- (D/2 - (L - 1)) (mod N) samples before the analysis transform and N- D/2 samples after a (single) synthesis transform respectively. However, when combining asymmetric windows with a higher order base transposition factor, e.g., B = 8, and the FFT split
stage 348, the time shifts after the synthesis transforms will be (N- D/2)/B samples, which may not be an integer value. In this case, a rounded value may be used as an approximation. Additionally, in order to save complexity, the analysis modulation may be combined with the synthesis modulation as a merged synthesis modulation as given by Eq. 6: -
-
- Again, Eq. 7 provides only an approximation of the frequency modulation implemented by Eq. 6 (which in itself may be an approximation) when the argument to the ceil-function ┌·┐ (rounding up to closest integer) is not an exact integer. It should also be noted that Eqs. 5 or 6 above are preferably applied only to the limited part of the coefficients that will be included in the two inverse Fourier transforms.
-
-
FIG. 7A is a table illustrating the total latency values for a first hop size, andFIG. 7B is a table illustrating the total latency values for a second hop size for a virtual bass latency reduction system that uses asymmetric transform windows, under an embodiment. Table 3 ofFIG. 7A illustrates the latency for a hop size of SA = 4, for various transform delay values (D = 15 to 127) and base transposition factors (B = 2 to 16). In comparison, Table 4 ofFIG. 7B illustrates the latency for a hop size of SA = 2, for the same various transform delay values (D = 15 to 127) and base transposition factors (B = 2 to 16). As can be seen in Table 4, the latency reduction going from a symmetric 64-tap window (D = 63) to the asymmetric window is 828 samples (2204 - 1376 = 828, for a nominal case where SA = 2 and B = 8). -
- The amount of asymmetry of the transposition windows may vary depending upon the constraints and requirements of the system. In an embodiment and particular implementation, the group delay of the asymmetric window is selected to be close to half of the transform delay in order to maintain adequate transposition quality. Thus, in this case, Gd ≈ D/2 = 20. This may be accomplished by including a constraint for the group delay during an optimization phase for design of the asymmetric filter.
- With reference to
FIG. 4 , a third latency reduction element comprises using truncated Nyquist prototype filters, 406. As shown inFIG. 3C , to be able to mix the virtual bass signal in the Hybrid domain, 8-channel and 4-channel Nyquistanalysis filter banks 360 are applied to the virtual bass output CQMF channels (these filter banks correspond to theNyquist filter banks FIG. 3A ). In an embodiment, the Nyquistanalysis filter banks 360 use symmetric 13-tap prototype filters, which can result in a delay of six CQMF samples (e.g., in this case, 6·64 = 384 output samples). By removing the six coefficients of the prototype filter that act on future samples this entire delay (e.g., 384 samples) may be eliminated. In general, the Nyquist analysis/synthesis chain still provides perfect reconstruction. However, the frequency responses of the Nyquist filter banks using truncated filters may change. Optimization of the remaining filter coefficients may improve the potentially poorer frequency responses of the Nyquist filter banks using truncated filters. - With reference to
FIG. 4 , a fourth latency reduction element comprises letting the virtual bass signal lag the original signal, 408. In this case, the latency of the overall system can be reduced as the wide band signal (i.e., theHybrid signal A 364 ofFIG. 3C ) is delayed a shorter period of time than the virtual bass system delay actually implies. Informal listening tests have shown that a lag below 20 ms does not hamper the virtual bass effect. This lag corresponds to 960 samples for a 48 kHz audio signal. - In a particular implementation of an embodiment, the virtual bass signal is allowed to lag the wide band signal by a total of 352 samples (7.33 ms at 48 kHz). Of these 352 samples, 32 samples are coming from the use of the asymmetric transform window as 1376 is not evenly divisible by the CQMF filter bank size of 64. Hence, the delay from the asymmetric window transform can be divided into a wide band latency of 1344 plus a bass lag of 32 samples. The extra lag added on top of the 32 samples is thus 320 samples (5 CQMF samples, corresponding to 6.67 ms at 48 kHz sampling frequency).
- The different latency reduction elements 402-408 of
FIG. 4 may be used in any practical number of combinations to achieve a reduction in virtual bass system latency. Furthermore, the appropriate variables of each latency reduction method may be altered to increase the latency in relation to any perceived decrease in virtual bass signal quality. In an embodiment, the four latency reduction elements were implemented using the following values: base transposition factor B = 8, hop-size SA = 2, transform delay D = 40, truncated Nyquist filters and 320 samples of extra virtual bass lag. In this example case, the resulting virtual bass system delay in output samples was as follows:input B 203 inFIG. 2 , and signalB 306 ofFIG. 3A asinput D 332 in thevirtual bass module 330 ofFIG. 3C ), can save another 384 samples of delay, resulting in a virtual bass system delay 1024 - 384 = 640 samples (corresponding to 13 ms at 48kHz sampling frequency). - The delay of 640 samples in this example case is significantly less than the nominal delay of 3200 samples in the legacy virtual bass system described previously. This delay can be reduced even further by adding more virtual bass lag, by increasing the hop-size SA to 4 instead of 2, or by designing an asymmetric transform window with a resulting analysis/synthesis delay shorter than 40. However, the change of any such values may result in slightly poorer virtual bass quality, though the latency may be further reduced.
- Embodiments of a virtual bass latency reduction system as described herein may be used in conjunction with any appropriate virtual bass generation system, such as that illustrated in
FIGS. 2 and3 .FIG. 8 is a block diagram illustrating an audio processing system that includes a virtual bass generation system and a latency reduction system, under an embodiment. As shown inFIG. 8 ,system 800 comprises avirtual bass system 330 as illustrated inFIG. 3C .Virtual bass system 330 receives input audio signals 801 and performs certain frequency transposition functions to produce enhanced audio content for playback throughspeakers 806 that may be of limited frequency response capability. Certain latencies may be associated with the transposition functions performed by thevirtual bass system 330. In an embodiment, a virtual bass latency reduction system 400 (as illustrated inFIG. 4 ) is provided as a post-process to thevirtual bass system 330 to reduce the latencies associated with virtual bass processing. The reduced latency audio signals from thevirtual bass systems rendering subsystem 802 that is configured to generate speaker feeds that may be fed throughamplifier 804 for left and right (or multi-channel)speakers 806. - Although the virtual bass
latency reduction system 400 is shown to be a separate post-process element insystem 800, it should be noted that such a latency reduction system may be implemented as part of the virtual bass system 330 (as indicated earlier), or as part of any other appropriate element ofsystem 800, such as a functional component withinrendering subsystem 802. Likewise, thevirtual bass system 330 may be a legacy virtual bass generation system as outlined in the background, or it may be any other virtual bass generation and processing system that uses harmonic transposition to enhance input audio signals 801 to increase the perceived level of bass content for playback throughspeakers 806. - Embodiments of the virtual bass latency reduction system can be used in any audio processing system that renders and plays back digital audio through a variety of different playback devices and audio speakers (transducers). These speakers may be embodied in any of a variety of different listening devices or items of playback equipment, such as computers, televisions, stereo systems (home or cinema), mobile phones, tablets, and other portable playback devices. The speakers may be of any appropriate size and power rating, and may be provided in the form of free-standing drivers, speaker enclosures, surround-sound systems, soundbars, headphones, earbuds, and so on. The speakers may be configured in any appropriate array, and may include monophonic drivers, binaural speakers, surround-sound speaker arrays, or any other appropriate array of audio drivers.
- Aspects of one or more embodiments described herein may be implemented in an audio system that processes audio signals for transmission across a network that includes one or more computers or processing devices executing software instructions. Any of the described embodiments may be used alone or together with one another in any combination. Although various embodiments may have been motivated by various deficiencies with the prior art, which may be discussed or alluded to in one or more places in the specification, the embodiments do not necessarily address any of these deficiencies. In other words, different embodiments may address different deficiencies that may be discussed in the specification. Some embodiments may only partially address some deficiencies or just one deficiency that may be discussed in the specification, and some embodiments may not address any of these deficiencies.
- Aspects of the systems described herein may be implemented in an appropriate computer-based sound processing network environment for processing digital or digitized audio files. Portions of the adaptive audio system may include one or more networks that comprise any desired number of individual machines, including one or more routers (not shown) that serve to buffer and route the data transmitted among the computers. Such a network may be built on various different network protocols, and may be the Internet, a Wide Area Network (WAN), a Local Area Network (LAN), or any combination thereof.
- One or more of the components, blocks, processes or other functional components may be implemented through a computer program that controls execution of a processor-based computing device of the system. It should also be noted that the various functions disclosed herein may be described using any number of combinations of hardware, firmware, and/or as data and/or instructions embodied in various machine-readable or computer-readable media, in terms of their behavioral, register transfer, logic component, and/or other characteristics. Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, physical (non-transitory), non-volatile storage media in various forms, such as optical, magnetic or semiconductor storage media.
- Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise," "comprising," and the like are to be construed in an inclusive sense as opposed to an exclusive or exhaustive sense; that is to say, in a sense of "including, but not limited to." Words using the singular or plural number also include the plural or singular number respectively. Additionally, the words "herein," "hereunder," "above," "below," and words of similar import refer to this application as a whole and not to any particular portions of this application. When the word "or" is used in reference to a list of two or more items, that word covers all of the following interpretations of the word: any of the items in the list, all of the items in the list and any combination of the items in the list.
- While one or more implementations have been described by way of example and in terms of the specific embodiments, it is to be understood that one or more implementations are not limited to the disclosed embodiments. To the contrary, it is intended to cover various modifications and similar arrangements as would be apparent to those skilled in the art. Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.
Claims (15)
- A method of generating low latency virtual bass, comprising:receiving an input audio signal;performing harmonic transposition on low frequency components of the input audio signal to generate transposed data indicative of harmonics of the input audio signal;generating a virtual bass signal in response to the transposed data; andgenerating an enhanced audio signal by combining the virtual bass signal with a delayed version of the input audio signal, wherein the harmonic transposition employs combined transposition using a base transposition order B higher than 2 such that the harmonics include a second order harmonic and at least one higher order harmonic of each of the low frequency components, and characterized in that all of the harmonics are generated in response to frequency-domain values determined by a common time-to-frequency domain transform stage using an asymmetric analysis window, and a subsequent inverse transform determined by a common frequency-to-time domain transform stage using an asymmetric synthesis window.
- The method of claim 1 wherein the input audio signal is a sub-band complex-valued quadrature mirror filter (CQMF) signal indicative of critically sampled or close to critically sampled low frequency audio from a set of CQMF sub-band signals.
- The method of claim 2 wherein the critically sampled or close to critically sampled low frequency input audio is a CQMF channel 0 signal indicative of the lowest frequency band from a set of CQMF sub-band signals.
- The method of claim 3 further comprising:generating transposed data from low frequency components by performing a frequency domain oversampled transform on the input audio signal by generating asymmetrically windowed, zero-padded samples, and performing a time-to-frequency domain transform on the asymmetrically windowed, zero-padded samples, and subsequently performing a non-linear operation on the output from the time-to-frequency domain transform to generate the transposed data from the low frequency components;generating two sets of frequency components from the frequency components processed by the non-linear operation by splitting into a first set of frequency components in a first frequency band and a second set of frequency components in a second frequency band; andfurther performing a first frequency-to-time domain transform on the first set of frequency components and a second frequency-to-time domain transform on the second set of frequency components, wherein each of the first frequency-to-time domain transform and the second frequency-to-time domain transform have transform sizes B times smaller than the time-to-frequency domain transform; andfurther applying asymmetric zero-padded windows to the samples from the frequency-to-time domain transforms, wherein the asymmetric zero-padded windows are B times shorter than the asymmetrically windowed, zero-padded samples generated from the input audio signal, thus forming two sets of transposed data.
- The method of claim 4 wherein the first frequency band is the frequency band of CQMF channel 0, and the second frequency band is the frequency band of CQMF channel 1 from a set of CQMF sub-band signals,
wherein generating a virtual bass signal in response to the transposed data comprises an analysis filter bank applied to one or both of the two sets of transposed data, wherein the analysis filter bank comprises a truncated version of a symmetric filter. - The method of claim 1 wherein the delayed version of the input audio signal is delayed a pre-defined time period shorter than the latency of the virtual bass signal and the enhanced audio signal is indicative of a time lagged virtual bass signal.
- The method of claim 3 wherein the input audio CQMF channel 0 is received directly from the analysis CQMF bank output of a pre-processing Hybrid filter bank stage, bypassing the Nyquist analysis filter bank of the pre-processing Hybrid filter bank stage.
- An apparatus for generating low latency virtual bass, comprising:a first component adapted for receiving an input audio signal and adapted for performing harmonic transposition on low frequency components of the input audio signal to generate transposed data indicative of harmonics of the input audio signal; anda second component adapted for generating a virtual bass signal in response to the transposed data and adapted for combining the virtual bass signal with a delayed version of the input audio signal to generate an enhanced audio signal, wherein the harmonic transposition employs combined transposition using a base transposition order B higher than 2 such that the harmonics include a second order harmonic and at least one higher order harmonic of each of the low frequency components, and characterized in that all of the harmonics are generated in response to frequency-domain values determined by a common time-to-frequency domain transform stage using an asymmetric analysis window, and a subsequent inverse transform determined by a common frequency-to-time domain transform stage using an asymmetric synthesis window.
- The apparatus of claim 8 wherein the input audio signal is a sub-band complex-valued quadrature mirror filter (CQMF) signal indicative of critically sampled or close to critically sampled low frequency audio from a set of CQMF sub-band signals.
- The apparatus of claim 9 wherein the critically sampled or close to critically sampled low frequency audio is a CQMF channel 0 signal indicative of the lowest frequency band from a set of CQMF sub-band signals.
- The apparatus of claim 10 further comprising:a third component adapted for generating transposed data from low frequency components by performing a frequency domain oversampled transform on the input audio signal by generating asymmetrically windowed, zero-padded samples, and performing a time-to-frequency domain transform on the asymmetrically windowed, zero-padded samples, and subsequently performing a non-linear operation on the output from the time-to-frequency domain transform to generate the transposed data from the low frequency components;a fourth component adapted for generating two sets of frequency components from the frequency components processed by the non-linear operation by splitting into a first set of frequency components in a first frequency band and a second set of frequency components in a second frequency band;a fifth component adapted for further performing a first frequency-to-time domain transform on the first set of frequency components and a second frequency-to-time domain transform on the second set of frequency components, wherein each of the first frequency-to-time domain transform and the second frequency-to-time domain transform have transform sizes B times smaller than the time-to-frequency domain transform; anda sixth component adapted for applying asymmetric zero-padded windows to the samples from the frequency-to-time domain transforms, wherein the asymmetric zero-padded windows are B times shorter than the asymmetrically windowed, zero-padded samples generated from the input audio signal, thus forming two sets of transposed data.
- The apparatus of claim 11 wherein the first frequency band is the frequency band of CQMF channel 0, and the second frequency band is the frequency band of CQMF channel 1 from a set of CQMF sub-band signals, and wherein generating a virtual bass signal in response to the transposed data comprises an analysis filter bank applied to one or both of the two sets of transposed data, wherein the analysis filter bank comprise a truncated version of a symmetric filter.
- The apparatus of claim 8 further comprising:a timing component adapted for generating a version of the input audio signal delayed a pre-defined time period shorter than the latency of the virtual bass signal; anda mixing component adapted for combining the virtual bass signal with the delayed input audio signal to generate an enhanced audio signal indicative of a time lagged virtual bass signal.
- The apparatus of claim 10 further comprising an interface component adapted for receiving the CQMF channel 0 directly from the analysis CQMF bank output of a pre-processing Hybrid filter bank, bypassing the Nyquist analysis filter bank of the pre-processing Hybrid filter bank stage.
- A computer-readable storage medium storing executable computer program instructions for executing a method according to any of claims 1-7, when run on a computer.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/652,023 US8971551B2 (en) | 2009-09-18 | 2012-10-15 | Virtual bass synthesis using harmonic transposition |
PCT/EP2013/070262 WO2014060204A1 (en) | 2012-10-15 | 2013-09-27 | System and method for reducing latency in transposer-based virtual bass systems |
Publications (2)
Publication Number | Publication Date |
---|---|
EP2907324A1 EP2907324A1 (en) | 2015-08-19 |
EP2907324B1 true EP2907324B1 (en) | 2016-11-09 |
Family
ID=49293633
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP13771123.0A Active EP2907324B1 (en) | 2012-10-15 | 2013-09-27 | System and method for reducing latency in transposer-based virtual bass systems |
EP13188415.7A Active EP2720477B1 (en) | 2012-10-15 | 2013-10-14 | Virtual bass synthesis using harmonic transposition |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP13188415.7A Active EP2720477B1 (en) | 2012-10-15 | 2013-10-14 | Virtual bass synthesis using harmonic transposition |
Country Status (4)
Country | Link |
---|---|
EP (2) | EP2907324B1 (en) |
JP (1) | JP5894347B2 (en) |
CN (1) | CN104704855B (en) |
WO (1) | WO2014060204A1 (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105280189B (en) * | 2015-09-16 | 2019-01-08 | 深圳广晟信源技术有限公司 | The method and apparatus that bandwidth extension encoding and decoding medium-high frequency generate |
KR102578008B1 (en) * | 2019-08-08 | 2023-09-12 | 붐클라우드 360 인코포레이티드 | Nonlinear adaptive filterbank for psychoacoustic frequency range expansion. |
EP4122217A1 (en) | 2020-03-20 | 2023-01-25 | Dolby International AB | Bass enhancement for loudspeakers |
EP4367901A1 (en) * | 2021-07-09 | 2024-05-15 | Soundfocus Aps | Method and transducer array system for directionally reproducing an input audio signal |
EP4367906A1 (en) * | 2021-07-09 | 2024-05-15 | Soundfocus Aps | Method and loudspeaker system for processing an input audio signal |
JP2023130644A (en) * | 2022-03-08 | 2023-09-21 | アルプスアルパイン株式会社 | Acoustic signal processing device, acoustic system, and method for enhancing low-pitched sound feeling |
Family Cites Families (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
SE0101175D0 (en) | 2001-04-02 | 2001-04-02 | Coding Technologies Sweden Ab | Aliasing reduction using complex-exponential-modulated filter banks |
TWI339991B (en) * | 2006-04-27 | 2011-04-01 | Univ Nat Chiao Tung | Method for virtual bass synthesis |
US8036903B2 (en) * | 2006-10-18 | 2011-10-11 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Analysis filterbank, synthesis filterbank, encoder, de-coder, mixer and conferencing system |
JP4983694B2 (en) * | 2008-03-31 | 2012-07-25 | 株式会社Jvcケンウッド | Audio playback device |
ES2966639T3 (en) * | 2009-01-16 | 2024-04-23 | Dolby Int Ab | Enhanced harmonic transposition of cross product |
CN101505443B (en) * | 2009-03-13 | 2013-12-11 | 无锡中星微电子有限公司 | Virtual supper bass enhancing method and system |
GB0906594D0 (en) * | 2009-04-17 | 2009-05-27 | Sontia Logic Ltd | Processing an audio singnal |
KR101613684B1 (en) * | 2009-12-09 | 2016-04-19 | 삼성전자주식회사 | Apparatus for enhancing bass band signal and method thereof |
US8638953B2 (en) * | 2010-07-09 | 2014-01-28 | Conexant Systems, Inc. | Systems and methods for generating phantom bass |
PL2596497T3 (en) * | 2010-07-19 | 2014-10-31 | Dolby Int Ab | Processing of audio signals during high frequency reconstruction |
JP5375861B2 (en) * | 2011-03-18 | 2013-12-25 | ヤマハ株式会社 | Audio reproduction effect adding method and apparatus |
CN102354500A (en) * | 2011-08-03 | 2012-02-15 | 华南理工大学 | Virtual bass boosting method based on harmonic control |
TWI575962B (en) * | 2012-02-24 | 2017-03-21 | 杜比國際公司 | Low delay real-to-complex conversion in overlapping filter banks for partially complex processing |
-
2013
- 2013-09-27 WO PCT/EP2013/070262 patent/WO2014060204A1/en active Application Filing
- 2013-09-27 JP JP2015536058A patent/JP5894347B2/en active Active
- 2013-09-27 EP EP13771123.0A patent/EP2907324B1/en active Active
- 2013-09-27 CN CN201380053450.0A patent/CN104704855B/en active Active
- 2013-10-14 EP EP13188415.7A patent/EP2720477B1/en active Active
Also Published As
Publication number | Publication date |
---|---|
JP5894347B2 (en) | 2016-03-30 |
EP2720477B1 (en) | 2016-03-02 |
WO2014060204A1 (en) | 2014-04-24 |
JP2015531575A (en) | 2015-11-02 |
CN104704855B (en) | 2016-08-24 |
CN104704855A (en) | 2015-06-10 |
EP2720477A1 (en) | 2014-04-16 |
EP2907324A1 (en) | 2015-08-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9407993B2 (en) | Latency reduction in transposer-based virtual bass systems | |
US7487097B2 (en) | Advanced processing based on a complex-exponential-modulated filterbank and adaptive time signalling methods | |
EP2907324B1 (en) | System and method for reducing latency in transposer-based virtual bass systems | |
RU2666316C2 (en) | Device and method of improving audio, system of sound improvement | |
JP5607626B2 (en) | Parametric stereo conversion system and method | |
JP2005530432A (en) | Method for digital equalization of sound from loudspeakers in a room and use of this method | |
JP7410282B2 (en) | Subband spatial processing and crosstalk processing using spectrally orthogonal audio components | |
SG183966A1 (en) | Improved magnitude response and temporal alignment in phase vocoder based bandwidth extension for audio signals | |
US8295508B2 (en) | Processing an audio signal | |
CN112566008A (en) | Audio upmixing method and device, electronic equipment and storage medium | |
JP7260101B2 (en) | Information processing device, mixing device using the same, and latency reduction method | |
CN114846820B (en) | Subband space and crosstalk processing using spectrally orthogonal audio components | |
CN112584300B (en) | Audio upmixing method, device, electronic equipment and storage medium | |
JP2024510177A (en) | Audio decorrelator, processing system and method for decorrelating audio signals | |
CN117157706A (en) | Audio decorrelator, processing system and method for decorrelating audio signals |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20150515 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
DAX | Request for extension of the european patent (deleted) | ||
RIC1 | Information provided on ipc code assigned before grant |
Ipc: H04R 3/04 20060101AFI20160428BHEP Ipc: G10L 21/038 20130101ALI20160428BHEP |
|
GRAP | Despatch of communication of intention to grant a patent |
Free format text: ORIGINAL CODE: EPIDOSNIGR1 |
|
INTG | Intention to grant announced |
Effective date: 20160607 |
|
GRAS | Grant fee paid |
Free format text: ORIGINAL CODE: EPIDOSNIGR3 |
|
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: REF Ref document number: 844866 Country of ref document: AT Kind code of ref document: T Effective date: 20161115 Ref country code: CH Ref legal event code: EP |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: FG4D |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R096 Ref document number: 602013013882 Country of ref document: DE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LV Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 |
|
REG | Reference to a national code |
Ref country code: LT Ref legal event code: MG4D |
|
REG | Reference to a national code |
Ref country code: NL Ref legal event code: MP Effective date: 20161109 |
|
REG | Reference to a national code |
Ref country code: AT Ref legal event code: MK05 Ref document number: 844866 Country of ref document: AT Kind code of ref document: T Effective date: 20161109 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: NO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20170209 Ref country code: LT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 Ref country code: NL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 Ref country code: GR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20170210 Ref country code: SE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: FI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 Ref country code: PL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 Ref country code: AT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 Ref country code: HR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 Ref country code: RS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 Ref country code: ES Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 Ref country code: PT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20170309 Ref country code: IS Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20170309 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 Ref country code: DK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 Ref country code: EE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 Ref country code: RO Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 Ref country code: CZ Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R097 Ref document number: 602013013882 Country of ref document: DE |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SM Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 Ref country code: BE Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 Ref country code: IT Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 Ref country code: BG Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20170209 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 5 |
|
26N | No opposition filed |
Effective date: 20170810 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: SI Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 |
|
REG | Reference to a national code |
Ref country code: CH Ref legal event code: PL |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MC Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 |
|
REG | Reference to a national code |
Ref country code: IE Ref legal event code: MM4A |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: LU Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170927 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: IE Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170927 Ref country code: LI Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170930 Ref country code: CH Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170930 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 6 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MT Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES Effective date: 20170927 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: HU Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO Effective date: 20130927 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: CY Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: MK Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: TR Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: AL Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT Effective date: 20161109 |
|
REG | Reference to a national code |
Ref country code: FR Ref legal event code: PLFP Year of fee payment: 10 |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R081 Ref document number: 602013013882 Country of ref document: DE Owner name: DOLBY INTERNATIONAL AB, IE Free format text: FORMER OWNER: DOLBY INTERNATIONAL AB, AMSTERDAM, NL Ref country code: DE Ref legal event code: R081 Ref document number: 602013013882 Country of ref document: DE Owner name: DOLBY INTERNATIONAL AB, NL Free format text: FORMER OWNER: DOLBY INTERNATIONAL AB, AMSTERDAM, NL |
|
REG | Reference to a national code |
Ref country code: DE Ref legal event code: R081 Ref document number: 602013013882 Country of ref document: DE Owner name: DOLBY INTERNATIONAL AB, IE Free format text: FORMER OWNER: DOLBY INTERNATIONAL AB, DP AMSTERDAM, NL |
|
P01 | Opt-out of the competence of the unified patent court (upc) registered |
Effective date: 20230512 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20240820 Year of fee payment: 12 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20240822 Year of fee payment: 12 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20240820 Year of fee payment: 12 |