[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2010125228A1 - Codage de signaux audio multivues - Google Patents

Codage de signaux audio multivues Download PDF

Info

Publication number
WO2010125228A1
WO2010125228A1 PCT/FI2009/050343 FI2009050343W WO2010125228A1 WO 2010125228 A1 WO2010125228 A1 WO 2010125228A1 FI 2009050343 W FI2009050343 W FI 2009050343W WO 2010125228 A1 WO2010125228 A1 WO 2010125228A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
signals
microphone
directional signal
ambience
Prior art date
Application number
PCT/FI2009/050343
Other languages
English (en)
Inventor
Juha OJANPERÄ
Original Assignee
Nokia Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Corporation filed Critical Nokia Corporation
Priority to PCT/FI2009/050343 priority Critical patent/WO2010125228A1/fr
Publication of WO2010125228A1 publication Critical patent/WO2010125228A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding using interchannel correlation to reduce redundancy, e.g. joint-stereo, intensity-coding or matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Definitions

  • the present invention relates to multiview audio signals, and more particularly to encoding, transmission and reconstruction of multiview audio signals.
  • a two/multi-channel audio signal is processed such that the audio signals to be reproduced on different audio channels differ from one another, thereby providing the listeners with an impression of a spatial effect around the audio source.
  • the spatial effect can be created by recording the audio directly into suitable formats for multi-channel or binaural reproduction, or the spatial effect can be created artificially in any two/multi-channel audio signal, which is known as spatialization.
  • parametric audio coding methods such as binaural cue coding (BCC) enable multi channel and surround audio coding and representation.
  • the common aim of the parametric methods for coding of spatial audio is to represent the original audio as a downmix signal comprising a reduced number of audio channels, for example as a monophonic or as two channel (stereo) sum signal, along with a bit stream of parameters describing the spatial image.
  • parameters describing the spatial image may comprise parameters descriptive of the energy level, time differences, and/or correlations between channels of a multi-channel audio signal may be used. This kind of coding scheme allows efficient compression of multi-channel signals.
  • multiview audio is a concept that provides different aural views to an audio scene, from which (e.g.) a user can select the one he/she prefers.
  • the multiview audio enables a new functionality that may be become an interesting feature e.g. in telepresence type of applications, audio/video conferencing, gaming, etc.
  • a method according to the invention is based on the idea of determining at least one directional signal as a composition of a plurality of microphone signals for a frequency band of an input frame; determining ambience signals for said frequency band of the input frame, wherein said ambience signals include individual microphone signal subtracted by their contribution to the at least one directional signal; and encoding said at least one directional signal using a first encoding scheme; and encoding said ambience signals using a second encoding scheme.
  • said first encoding scheme comprises a higher bitrate audio coding scheme
  • said second encoding scheme comprises a lower bitrate audio coding scheme.
  • the method further comprises determining a plurality of directional signals, wherein each directional signal is determined as a composition of a subset of said microphone signals for a frequency band of an input frame.
  • the step of determining at least one directional signal further comprises determining input energy for each microphone signal for said frequency band of the input frame for a selected time window; determining a direction angle of a sound source on the basis of the input energy of said microphone signals and microphone angles relative to a predetermined forward axis of the microphones; selecting at least one microphone having the microphone angle corresponding to the direction angle of the sound source; and creating the at least one directional signal for said frequency band of the input frame as a combination of the microphone signals corresponding to the direction angle of the sound source.
  • the method further comprises repeating the steps of selecting the corresponding at least one microphone and creating the at least one directional signal for each of said frequency bands and for the selected time window; and gathering the composition of the directional signal for each of said frequency bands and for the selected time window for bitstream composition.
  • the step of determining ambience signals further comprises determining the number of microphone signals contributing to the at least one directional signal for said frequency band of the input frame; and in response to the number of microphone signals contributing to the at least one directional signal is one, setting the ambience signal for said frequency band of the input frame as zero; and in response to the number of microphone signals contributing to the at least one directional signal is more than one, determining the ambience signal for said frequency band of the input frame by subtracting the input signals of each of said microphone signals from said at least one directional signal.
  • the method further comprises examining correlations between a plurality of microphone signals, and determining the ambience signal on the basis of microphone signals having correlation that meets predetermined criteria.
  • the method prior to determining the at least one directional signal the method further comprises transforming the plurality of microphone signals into frequency domain; and dividing the plurality of microphone signals in frequency domain subbands complying with the Equivalent Rectangular Bandwidth (ERB) scale.
  • ERP Equivalent Rectangular Bandwidth
  • the arrangement according to the invention provides significant advantages.
  • a major advantage is the arrangement enables a high- quality encoding framework for multi-view audio at remarkably low bitrates. Generally, it allows switching from loudspeaker position controlled coding to towards audio scene based coding where the events in the audio scene are analyzed and coded efficiently. Thus, it enables the end user to select different aural views from audio recording that contains multiple aural views to the audio scene.
  • a method according to the invention is based on the idea of receiving at least one encoded directional signal and encoded ambience signals for a frequency band of an input frame from the multiview audio signal; decoding said at least one encoded directional signal using a first decoding scheme; decoding said encoded ambience signals using a second decoding scheme; ; reconstructing the at least one directional signal for said frequency band of the input frame on the basis of its composition information; reconstructing the ambience signals for said frequency band of the input frame on the basis of their composition information ; and reconstructing at least one microphone signal for said frequency band of the input frame by updating the at least one directional signal with the corresponding ambience signals.
  • Fig. 1 shows an example of a generic multiview audio capture and rendering system
  • Fig. 2 shows a schematic example of a composition of directional signal for 4 microphone signals and corresponding ambience signals in time-frequency domain
  • Fig. 3 shows a block diagram of for a multiview audio encoding side in accordance with an embodiment of the invention
  • Fig. 4 shows an example set-up for a microphone pairing comprising four microphones
  • Fig. 5 shows a block diagram for a multiview audio rendering side in accordance with an embodiment of the invention.
  • Fig. 6 shows an electronic device according to an embodiment of the invention in a reduced block chart.
  • the embodiments are related to multiview audio capture and rendering system, for example such as illustrated in Figure 1.
  • multiple, closely spaced, microphones, all pointing toward different angle relative to the forward axis are used to record the audio scene.
  • the captured signals are then processed for transmission over a communication channel or a network to the rendering side.
  • the captured signals may be processed for storage in a memory medium for later consumption or transmission to the rendering side.
  • the end user can select the aural view based on his/her preference from the provided multiview audio.
  • the selection of the aural view may be made for example by an application or system processing the multiview audio in the rendering side.
  • the aural view may be selected in conjunction with a related video component or with any other media component.
  • the rendering part then provides the downmixed signal(s) from the multi-microphone recording that correspond to the selected aural view.
  • the employed microphone set-up shown in Figure 1 is only an example of an audio capturing arrangement applicable in the present embodiments, and various microphone set-ups different from the one shown in the example of Figure 1 may be used.
  • Examples of different microphone set-ups include "traditional" multichannel audio set-up (for example 5.1 or 7.2 channel configuration), "traditional” multi- microphone set-up with multiple microphones placed close to each other on linear axis, and multiple microphones set on a surface of a sphere or a hemisphere according to a desired pattern/density.
  • Further examples include a set of microphones placed in an acoustic space according to a desired pattern/density or in random (but known) positions, for example to provide different aural views into a room or other corresponding acoustic space.
  • Figure 2 shows the composition of the directional signal for 4 microphone signals and the corresponding ambience signal in time-frequency domain.
  • the directional signal for each considered frequency band of an input audio frame one or more microphones contributing to the directional signal are determined, as illustrated in the diagram A of Figure 2.
  • the directional signal comprises signal from microphone 1
  • the directional signal comprises signal from microphone 2
  • the directional signal comprises signal from microphone 3
  • the directional signal comprises signal from microphone 4.
  • the frequency bands of an input audio frame in which this microphone contributes to the ambience signal are determined, as illustrated for microphones 1 - 4 in Figure 2 (diagrams from B to E).
  • microphone 1 when considering microphone 1 , for the audio frames U and t 2 , microphone 1 contributes to the ambience signal in the frequency bands from f 2 to f 5 .
  • microphone 1 For the audio frame t 3 , microphone 1 contributes to the ambience signal in the frequency bands ⁇ , f 3 , f 4 and f 5
  • microphone 1 contributes to the ambience signal in all frequency bands from f-i to f 5
  • the multiple directional signals could be determined, for example, based on particular subsets of microphone signals.
  • a first directional signal could be determined e.g. on the basis of signals from microphones 1 and 2
  • a second directional signal could then be determined on the basis of signals from microphones 3 and 4. Consequently, at least one directional signal is determined for each considered time-frequency tile.
  • multiple directional signals may be determined over a full set of microphone signals. Considering the same example as above, a first directional signal and a second directional signal are determined on the basis of signals from microphones 1 - 4.
  • the number of directional signals may be different from one time-frequency tile to another. Furthermore, in an embodiment, it is possible to have zero or more directional signals for each considered time-frequency tile.
  • Figure 3 shows a high level block diagram for the encoding side in accordance with an embodiment.
  • the N microphone signals are passed to the audio scene analyzer 300.
  • the analyzer derives information about the characteristics of the microphone signals for determination of directional and ambience signals, for example by determining the direction vectors for each considered time-frequency tile. Based on the direction information, at least one directional signal is then determined in the directional signal composition unit 302.
  • the ambience signals are determined in the ambience signal composition unit 304, based on the individual microphone signal subtracted by the related directional signal contribution received from the directional signal composition unit 302.
  • the directional signal(s) including the main audio scene information and ambience signals are encoded.
  • the directional signal(s) are encoded using a first compression encoder 306 that provides high quality audio (at high bitrate), while the ambience signals are encoded using a second compression encoder 308 that may provide lower quality audio (at lower bitrate). This approach advantageously keeps the overall bitrate used to represent the multiview audio scene within reasonable limits.
  • the encoded signals and the multiview scene composition information are provided for a bitstream composer 310 for encapsulation in a format suitable for transmission or storage.
  • the multiview scene composition information comprises composition information for directional signal(s) and composition information for ambience signals.
  • the first and second compression encoders 306 and 308, respectively, may use any suitable compression coding techniques.
  • the implementation of the embodiments is not limited by the selected technique, but coding approaches such as AAC (Advanced Audio Coding), HE-AAC (High Efficiency Advanced Audio Coding), and ITU-T G.718 are suitable candidates for this purpose.
  • a high bitrate encoding mode of a codec e.g. G.718 codec
  • G.718 codec a low bitrate encoding mode of the same codec
  • the bitstream composer 310 may act as a multiplexer, encapsulating the encoded signals and the multiview scene composition information into a single bitstream unit for transmission or storage.
  • the encoded signals and the multiview scene information are encapsulated into two or more bitstream units by the bitstream composer 310.
  • the directional signal(s) and the multiview scene information may be encapsulated into one bitstream unit, while the ambience signals are encapsulated into one or more bitsream units.
  • determining at least one directional signal corresponding to a number of subsets of microphone signals data is encapsulated into a number of bitstream units, each corresponding to the directional signal(s), the ambience signals and the multiview scene information corresponding to respective subset of microphone signals.
  • the bitstream may comprise indication about mapping between the encoded signals and respective microphone identifier to support signal reconstruction in the rendering side. Furthermore, the bitstream may comprise information regarding the compression method, i.e. encoding, applied for the directional signal(s) and for the ambience signals.
  • the microphone signals are first transformed into frequency domain.
  • the frequency domain representation may be obtained using DFT (Discrete Fourier Transform), MDCT/MDST (Modified Discrete Cosine/Sine Transform), QMF (Quadrature Mirror Filter), complex valued QMF or any other transform that provides frequency domain output.
  • DFT Discrete Fourier Transform
  • MDCT/MDST Modified Discrete Cosine/Sine Transform
  • QMF Quadratture Mirror Filter
  • complex valued QMF any other transform that provides frequency domain output.
  • the spectral bins are divided into psycho-acoustically motivated frequency subbands.
  • nonuniform subbands are used that more closely reflect the auditory sensitivity.
  • the non-uniform bands follow the boundaries of the ERB scale (Equivalent Rectangular Bandwidth), while other embodiments of the invention may use any suitable division to frequency bands..
  • the analyzer 300 calculates direction vectors for each considered time- frequency tile.
  • the direction vector described by polar coordinates indicates the sound events radial position and direction angle with respect to the forward axis.
  • the input signal energy for microphone n at frequency band m over time window T may be computed by
  • f t n is the frequency domain representation of n th microphone signal at time instant t.
  • the perceived direction of a source within the time window T is determined for each considered subband.
  • the localization is defined as
  • ⁇ n describes the microphone angles relative to the forward axis.
  • the direction angle of the sound events is then determined as follows
  • Equations (2) and (3) are repeated for 0 ⁇ m ⁇ M .
  • the directional signal is determined for each considered time- frequency tile based on the direction angle according to following steps
  • D ⁇ r t (j) g t,m (j), sb ⁇ ffset[m] ⁇ j ⁇ sb ⁇ ffset[m + 1] where dJ ⁇ U) where K ⁇ is the vector size of J .
  • the directional signal Dir t ⁇ j) for frequency bin j in input frame t is the combination of the microphone signals found in step 1 .
  • the combination of the signals may be for example an average of the signals or a suitable weighted sum of the signals.
  • the directional signal may be composed using a method different from the one described above.
  • the direction angle used above for the directional signal composition may be determined based on a TDOA (time delay of arrival) scheme, which tracks the time delay in the signal captured by different microphones and then based on the time delay estimation calculates the source direction.
  • TDOA time delay of arrival
  • An example of a TDOA scheme is given in "Robust sound source localization using a microphone array on a mobile robot” Valin, J. -M.; Michaud, F.; Rouat, J.; Letoumeau, D.; Intelligent Robots and Systems, 2003. (IROS 2003). Proceedings. 2003 IEEE/RSJ International Conference on Volume 2, 27-31 Oct. 2003 Page(s):1228 - 1233 vol.2.
  • the directional signal for a time-frequency tile may be composed by selecting the microphone signal that has the highest energy in the respective time-frequency tile.
  • a combination of microphone signals may be used to compose the directional signal in case several microphone signals have high energy in the same time-frequency tile.
  • microphone signals having energy exceeding a predetermined (fixed or adaptive) threshold may be selected for the directional signal composition.
  • the microphone signal having the highest energy is selected for the directional signal composition, together with other microphone signals having energy within a predetermined (fixed or adaptive) margin with respect to the microphone signal with the highest energy.
  • various techniques based on the characteristics of the microphone signals may be used for composition of the directional signal(s).
  • the ambience signal for microphone n in frequency bin j of input frame t is determined as follows
  • the ambience signal in frequency bin j of input frame t for microphone n-i is set to zero, whereas for other microphones respective ambience signal is set to a value equal to the input signal in frequency bin j in input frame t.
  • the ambience signal for microphone n partially contributing to the directional signal is determined by subtracting the input signal in frequency bin j in input frame t for microphone n from the directional signal determined for frequency bin j in input frame t. Equation (4) is repeated for 0 ⁇ m ⁇ M , 0 ⁇ n ⁇ N and t ⁇ T , i.e. for each subband and for each microphone over the selected time window.
  • parametric multi- channel coding techniques such as BCC may be used to encode the ambience signals at low bitrates without compromising the audio quality. This may be especially beneficial in case where there are correlations between ambience signals.
  • the ambience signal may be further reduced by exploiting the possible correlations that may exist between microphone signals.
  • such method exploiting correlations between two microphone signals is given below.
  • Equation (5) the value of the inter- microphone correlation im _corr t (x jm) for microphones x and y and for frequency band m in input frame t approaches unity for highly 10 correlated microphone signals, and approaches zero for uncorrelated microphone signals.
  • Figure 4 shows an example set-up for the microphone pairing comprising four microphones, for which inter-channel microphone 15 calculations are carried out in order to find the microphone pairs (x,y) having correlation that meets a predetermined criteria.
  • Such microphone pairs may be determined on a subband by subband basis, for example according to following steps
  • step 1 a decision is made for each microphone pair (x, y) whether the microphone signals are correlated.
  • a correlation value 0.8 is used as a threshold of significant correlation in the decision making, while other embodiments of the invention may apply a different threshold or different criteria.
  • Each microphone signal is assigned one gain value that indicates the amount of scaling needed with respect to the sum signal where the sum is composed of the microphones signals within the pair. After the gain values have been calculated, the sum signal is generated for encoding. In step 4, each microphone signal gets assigned the sum signal, but optimal implementations naturally encode the sum signal only once. In step 5, joint coding status information imc_ind t (x y) (m), and gain values g t,x (m), g, ,y (m), if joint coding was enabled, are sent to the bitstream composer.
  • determining of the correlations is not limited to microphone pairs only (i.e. two microphones), but the correlations between more than two microphones may be analyzed and exploited in order to reduce data rate required for the ambience signal(s).
  • the encoding embodiments described above provide significant advantages.
  • the arrangement described above enables a high-quality encoding framework for multi-view audio at remarkably low bitrates. Generally, it allows switching from loudspeaker position controlled coding to towards audio scene based coding where the events in the audio scene are analyzed and coded efficiently.
  • Figure 5 shows a high level block diagram for the rendering side in accordance with an embodiment.
  • the received bitstream is first decomposed in the bitstream decomposer 500 in order to obtain the encoded directional and ambience signals.
  • the encoded directional signal(s) are decoded using a compression decoder 502 matching the compression encoder used for encoding the directional signal(s) in the encoding side
  • the encoded ambience signals are decoded using a compression decoder 504 matching the compression encoder used for encoding the ambience signal(s) in the encoding side.
  • the directional signals are then reconstructed with the help of the directional composition information 506, and the ambience signal is reconstructed with the help of the ambience composition information 508.
  • the multiview audio signals are then reconstructed in the multiview reconstruction unit 510 on the basis of the reconstructed directional signals, which are updated in accordance with the reconstructed ambience signal.
  • the multiview audio renderer 512 extracts the desired aural view for the listener, which is then reproduced to the listener using audio reproduction means 514, such as headphones or loudspeakers. It is to be noted that the implementation of the multiview audio renderer and the audio reproduction is not relevant for the implementation of the invention.
  • the microphone signals are reconstructed based on the received directional signal(s) and the received composition information for the directional signal as follows Di ⁇ t (A sbOffset[m] ⁇ j ⁇ sbOffset[m + 0 ⁇ n ⁇ K m (6)
  • Equation (6) is repeated for 0 ⁇ m ⁇ M and t e T .
  • the process of equation (6) is repeated for each directional signal using respective composition information.
  • the reconstructed microphone signals are updated by using the received composition information for the ambience signals according to
  • B 1 n is the decoded ambience signal for microphone signal n in frame t. If there is more than one microphone signal contributing to the directional signal, the ambience signal is added to the reconstructed directional signal; otherwise the reconstructed directional signal is used as such to reconstruct the n th microphone signal for frequency band m in input frame t.
  • the microphone signals that do not contribute to the reconstructed directional signal use the reconstructed ambience signal to reconstruct the microphone signals for frequency band m in input frame t.
  • the decoded ambience signal may be reconstructed for channel pairs according to following steps
  • imc _ receiver _ind t y 0
  • imc _ receiver _ ind t x (m ) 1
  • imc _ receiver _ ind t y ⁇ m) 1
  • step 2 if no side information has been received for any of the microphone signals within the pair, the joint coding status information is then read from the bitstream. If the joint coding information indicates that shared ambience signal is used for the frequency band in question, the gain values are then read for the microphone signals. Finally, the ambience signals are reconstructed for each microphone signal within the pair by simply multiplying the decoded ambience signal with the gain value.
  • step 2 it is again assumed that only one (sum) ambience signal is decoded from the bitstream and copied to relevant microphone signal frequency bands before the multiplication. Since the bitrate required for the transmission of encoded multiview audio signal is reasonably low, the invention is especially well applicable in systems, wherein the available bandwidth is a scarce resource, such as in wireless communication systems.
  • the embodiments are especially applicable in mobile terminals or in other portable device typically lacking high-quality loudspeakers, wherein the features of multiview audio sound can be introduced through headphones.
  • a further field of viable applications include teleconferencin g se rvi ces , wh e re i n th e pa rti ci p a n ts of th e teleconference can be easily distinguished by giving the listeners the choice to select a desired aural view e.g. to follow the direction of a participant who is currently speaking in the conference call.
  • FIG. 6 illustrates a simplified structure of a data processing device (TE), wherein the multiview audio encoding system according to the invention can be implemented.
  • the data processing device (TE) can be, for example, a mobile terminal, a MP3 player, a PDA device or a personal computer (PC).
  • the data processing unit (TE) comprises I/O means (I/O), a central processing unit (CPU) and memory (MEM).
  • the memory (MEM) comprises a read-only memory ROM portion and a rewriteable portion, such as a random access memory RAM and FLASH memory.
  • the information used to communicate with different external parties e.g.
  • a CD-ROM other devices and the user, is transmitted through the I/O means (I/O) to/from the central processing unit (CPU).
  • the data processing device is implemented as a mobile station, it typically includes a transceiver Tx/Rx, which communicates with the wireless network, typically with a base transceiver station (BTS) through an antenna.
  • Tx/Rx which communicates with the wireless network
  • BTS base transceiver station
  • User Interface (Ul) equipment typically includes a display, a keypad, a microphone and connecting means for headphones.
  • the data processing device may further comprise connecting means MMC, such as a standard form slot, for various hardware modules or as integrated circuits IC, which may provide various applications to be run in the data processing device.
  • MMC such as a standard form slot, for various hardware modules or as integrated circuits IC, which may provide various applications to be run in the data processing device.
  • the multiview audio encoding system may be executed in a central processing unit CPU or in a dedicated digital signal processor DSP (a parametric code processor) of the data processing device, whereby the data processing device receives the plurality of microphone signals.
  • the plurality of microphone signals may be received directly from microphones or from memory means, e.g. a CD-ROM, or from a wireless network via the antenna and the transceiver Tx/Rx.
  • the CPU or the DSP carries out the steps of determining the directional and ambience signals, and the data processing device preferably further comprises suitable audio encoders for encoding the directional and ambience signals as described above.
  • the decoding system may as well be executed in a central processing unit CPU or in a dedicated digital signal processor DSP of the data processing device, whereby the data processing device comprises means, e.g. a multiplexer, for separating the encoded directional signal and encoded ambience signals from the multiview audio signal.
  • the data processing device preferably further comprises suitable audio decoders for decoding the encoded directional signal and ambience signals separately with audio decoding schemes indicated in composition information of said signals and then the CPU or the DSP carries out the steps of reconstructing the at least one directional signal on the basis of its composition information, reconstructing the ambience signals on the basis of their composition information, and reconstructing at least one multiview audio signal by updating the at least one directional signal with the corresponding ambience signals.
  • the functionalities of the invention may be implemented in a terminal device, such as a mobile station, also as a computer program which, when executed in a central processing unit CPU or in a dedicated digital signal processor DSP, affects the terminal device to implement procedures of the invention.
  • Functions of the computer program SW may be distributed to several separate program components communicating with one another.
  • the computer software may be stored into any memory means, such as the hard disk of a PC or a CD- ROM disc, from where it can be loaded into the memory of mobile terminal.
  • the computer software can also be loaded through a network, for instance using a TCP/IP protocol stack.
  • the above computer program product can be at least partly implemented as a hardware solution, for example as ASIC or
  • FPGA circuits in a hardware module comprising connecting means for connecting the module to an electronic device, or as one or more integrated circuits IC, the hardware module or the ICs further including various means for performing said program code tasks, said means being implemented as hardware and/or software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Mathematical Physics (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Stereophonic System (AREA)

Abstract

Procédé de codage d'un signal audio multivue, le procédé comprenant : la détermination d'au moins un signal directionnel en tant que composition d'une pluralité de signaux de microphone pour une bande de fréquence d'une trame d'entrée (302) ; la détermination de signaux d'ambiance pour ladite bande de fréquence de la trame d'entrée (304), lesdits signaux d'ambiance comprenant un signal de microphone individuel soustrait par sa contribution audit signal directionnel ; et le codage dudit signal directionnel (306) et desdits signaux d'ambiance (308) séparément.
PCT/FI2009/050343 2009-04-30 2009-04-30 Codage de signaux audio multivues WO2010125228A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/FI2009/050343 WO2010125228A1 (fr) 2009-04-30 2009-04-30 Codage de signaux audio multivues

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/FI2009/050343 WO2010125228A1 (fr) 2009-04-30 2009-04-30 Codage de signaux audio multivues

Publications (1)

Publication Number Publication Date
WO2010125228A1 true WO2010125228A1 (fr) 2010-11-04

Family

ID=43031746

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/FI2009/050343 WO2010125228A1 (fr) 2009-04-30 2009-04-30 Codage de signaux audio multivues

Country Status (1)

Country Link
WO (1) WO2010125228A1 (fr)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2464145A1 (fr) * 2010-12-10 2012-06-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Appareil et procédé de décomposition d'un signal d'entrée à l'aide d'un mélangeur abaisseur
WO2013064860A1 (fr) * 2011-10-31 2013-05-10 Nokia Corporation Rendu de scène audio via un alignement de séries de données caractéristiques qui varient en fonction du temps
EP2641244A1 (fr) * 2010-11-19 2013-09-25 Nokia Corp. Conversion de signaux capturés par plusieurs microphones en signaux décalés utiles au traitement des signaux binauraux, et son utilisation
US9257130B2 (en) 2010-07-08 2016-02-09 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoding/decoding with syntax portions using forward aliasing cancellation
US9313599B2 (en) 2010-11-19 2016-04-12 Nokia Technologies Oy Apparatus and method for multi-channel signal playback
US9706324B2 (en) 2013-05-17 2017-07-11 Nokia Technologies Oy Spatial object oriented audio apparatus
US9794686B2 (en) 2010-11-19 2017-10-17 Nokia Technologies Oy Controllable playback system offering hierarchical playback options
RU2635244C2 (ru) * 2013-01-22 2017-11-09 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Устройство и способ для пространственного кодирования аудиообъекта с использованием скрытых объектов для воздействия на смесь сигналов
US9883314B2 (en) 2014-07-03 2018-01-30 Dolby Laboratories Licensing Corporation Auxiliary augmentation of soundfields
US20180206039A1 (en) * 2015-07-08 2018-07-19 Nokia Technologies Oy Capturing Sound
US10148903B2 (en) 2012-04-05 2018-12-04 Nokia Technologies Oy Flexible spatial audio capture apparatus
US10635383B2 (en) 2013-04-04 2020-04-28 Nokia Technologies Oy Visual audio processing apparatus

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070269063A1 (en) * 2006-05-17 2007-11-22 Creative Technology Ltd Spatial audio coding based on universal spatial cues
US20080004729A1 (en) * 2006-06-30 2008-01-03 Nokia Corporation Direct encoding into a directional audio coding format
US20090086998A1 (en) * 2007-10-01 2009-04-02 Samsung Electronics Co., Ltd. Method and apparatus for identifying sound sources from mixed sound signal

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070269063A1 (en) * 2006-05-17 2007-11-22 Creative Technology Ltd Spatial audio coding based on universal spatial cues
US20080004729A1 (en) * 2006-06-30 2008-01-03 Nokia Corporation Direct encoding into a directional audio coding format
US20090086998A1 (en) * 2007-10-01 2009-04-02 Samsung Electronics Co., Ltd. Method and apparatus for identifying sound sources from mixed sound signal

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9257130B2 (en) 2010-07-08 2016-02-09 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoding/decoding with syntax portions using forward aliasing cancellation
EP2641244A4 (fr) * 2010-11-19 2015-03-25 Nokia Corp Conversion de signaux capturés par plusieurs microphones en signaux décalés utiles au traitement des signaux binauraux, et son utilisation
US10477335B2 (en) 2010-11-19 2019-11-12 Nokia Technologies Oy Converting multi-microphone captured signals to shifted signals useful for binaural signal processing and use thereof
US9794686B2 (en) 2010-11-19 2017-10-17 Nokia Technologies Oy Controllable playback system offering hierarchical playback options
US9456289B2 (en) 2010-11-19 2016-09-27 Nokia Technologies Oy Converting multi-microphone captured signals to shifted signals useful for binaural signal processing and use thereof
US9313599B2 (en) 2010-11-19 2016-04-12 Nokia Technologies Oy Apparatus and method for multi-channel signal playback
EP2641244A1 (fr) * 2010-11-19 2013-09-25 Nokia Corp. Conversion de signaux capturés par plusieurs microphones en signaux décalés utiles au traitement des signaux binauraux, et son utilisation
CN103355001B (zh) * 2010-12-10 2016-06-29 弗劳恩霍夫应用研究促进协会 用以利用下变频混频器来分解输入信号的装置和方法
US10187725B2 (en) 2010-12-10 2019-01-22 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for decomposing an input signal using a downmixer
JP2014502478A (ja) * 2010-12-10 2014-01-30 フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ 予め計算された参照曲線を用いて入力信号を分解する装置および方法
CN103355001A (zh) * 2010-12-10 2013-10-16 弗兰霍菲尔运输应用研究公司 用以利用下变频混频器来分解输入信号的装置和方法
US9241218B2 (en) 2010-12-10 2016-01-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for decomposing an input signal using a pre-calculated reference curve
CN103348703A (zh) * 2010-12-10 2013-10-09 弗兰霍菲尔运输应用研究公司 用以利用预先算出的参考曲线来分解输入信号的装置和方法
EP2464146A1 (fr) * 2010-12-10 2012-06-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Appareil et procédé de décomposition d'un signal d'entrée à l'aide d'une courbe de référence pré-calculée
EP2464145A1 (fr) * 2010-12-10 2012-06-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Appareil et procédé de décomposition d'un signal d'entrée à l'aide d'un mélangeur abaisseur
CN103348703B (zh) * 2010-12-10 2016-08-10 弗劳恩霍夫应用研究促进协会 用以利用预先算出的参考曲线来分解输入信号的装置和方法
WO2012076331A1 (fr) * 2010-12-10 2012-06-14 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Appareil et procédé pour décomposer un signal d'entrée à l'aide d'une courbe de référence précalculée
JP2014502479A (ja) * 2010-12-10 2014-01-30 フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ ダウンミキサーを用いて入力信号を分解する装置および方法
WO2012076332A1 (fr) * 2010-12-10 2012-06-14 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Appareil et procédé pour décomposer un signal d'entrée au moyen d'un mélangeur-abaisseur
WO2013064860A1 (fr) * 2011-10-31 2013-05-10 Nokia Corporation Rendu de scène audio via un alignement de séries de données caractéristiques qui varient en fonction du temps
US10148903B2 (en) 2012-04-05 2018-12-04 Nokia Technologies Oy Flexible spatial audio capture apparatus
US10419712B2 (en) 2012-04-05 2019-09-17 Nokia Technologies Oy Flexible spatial audio capture apparatus
RU2635244C2 (ru) * 2013-01-22 2017-11-09 Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. Устройство и способ для пространственного кодирования аудиообъекта с использованием скрытых объектов для воздействия на смесь сигналов
US10482888B2 (en) 2013-01-22 2019-11-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for spatial audio object coding employing hidden objects for signal mixture manipulation
US10635383B2 (en) 2013-04-04 2020-04-28 Nokia Technologies Oy Visual audio processing apparatus
US9706324B2 (en) 2013-05-17 2017-07-11 Nokia Technologies Oy Spatial object oriented audio apparatus
US9883314B2 (en) 2014-07-03 2018-01-30 Dolby Laboratories Licensing Corporation Auxiliary augmentation of soundfields
US20180206039A1 (en) * 2015-07-08 2018-07-19 Nokia Technologies Oy Capturing Sound
EP3320677A4 (fr) * 2015-07-08 2019-01-23 Nokia Technologies OY Capture de son
US11115739B2 (en) 2015-07-08 2021-09-07 Nokia Technologies Oy Capturing sound
US11838707B2 (en) 2015-07-08 2023-12-05 Nokia Technologies Oy Capturing sound

Similar Documents

Publication Publication Date Title
WO2010125228A1 (fr) Codage de signaux audio multivues
EP3107094B1 (fr) Compression de representations decomposees d'un champ sonore
CN106663433B (zh) 用于处理音频数据的方法和装置
US20060198542A1 (en) Method for the treatment of compressed sound data for spatialization
US9565314B2 (en) Spatial multiplexing in a soundfield teleconferencing system
US9219972B2 (en) Efficient audio coding having reduced bit rate for ambient signals and decoding using same
US20080004729A1 (en) Direct encoding into a directional audio coding format
JP5227946B2 (ja) フィルタ適応周波数分解能
WO2021130405A1 (fr) Combinaison de paramètres audio spatiaux
JP2024512953A (ja) 空間音声ストリームの結合
WO2021130404A1 (fr) Fusion de paramètres audio spatiaux
EP3818730A1 (fr) Signalisation et synthèse de rapport énergétique
EP3530004A1 (fr) Système et procédé de gestion de contenu numérique
WO2010105695A1 (fr) Codage audio multicanaux
CN114582357A (zh) 一种音频编解码方法和装置
CN115346537A (zh) 一种音频编码、解码方法及装置
EP2489036A1 (fr) Procédé, appareil et programme informatique pour traiter des signaux audio multicanaux
EP4264603A1 (fr) Quantification de paramètres audio spatiaux
WO2022257824A1 (fr) Procédé et appareil de traitement de signal audio tridimensionnel
WO2022058645A1 (fr) Codage de paramètre audio spatial et décodage associé
CN115938388A (zh) 一种三维音频信号的处理方法和装置
CN115376527A (zh) 三维音频信号编码方法、装置和编码器
CA3208666A1 (fr) Transformation de parametres audio spatiaux
WO2021250311A1 (fr) Codage de paramètres audio spatiaux et décodage associé
CN115376528A (zh) 三维音频信号编码方法、装置和编码器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09843938

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09843938

Country of ref document: EP

Kind code of ref document: A1