EP4442009A1 - Modélisation paramétrée de son cohérent et incohérent - Google Patents
Modélisation paramétrée de son cohérent et incohérentInfo
- Publication number
- EP4442009A1 EP4442009A1 EP22818955.1A EP22818955A EP4442009A1 EP 4442009 A1 EP4442009 A1 EP 4442009A1 EP 22818955 A EP22818955 A EP 22818955A EP 4442009 A1 EP4442009 A1 EP 4442009A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- sound
- perceptual
- acoustic parameters
- parameters
- coherent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000001427 coherent effect Effects 0.000 title claims abstract description 95
- 238000000034 method Methods 0.000 claims abstract description 82
- 230000005236 sound signal Effects 0.000 claims abstract description 72
- 230000004044 response Effects 0.000 claims abstract description 69
- 238000012545 processing Methods 0.000 claims abstract description 30
- 238000003860 storage Methods 0.000 claims description 28
- 238000009877 rendering Methods 0.000 description 32
- 239000000523 sample Substances 0.000 description 23
- 238000004088 simulation Methods 0.000 description 19
- 238000002592 echocardiography Methods 0.000 description 17
- 238000013459 approach Methods 0.000 description 16
- 230000008447 perception Effects 0.000 description 11
- 230000000694 effects Effects 0.000 description 10
- 230000000007 visual effect Effects 0.000 description 8
- 238000013461 design Methods 0.000 description 6
- 230000006870 function Effects 0.000 description 6
- 239000000463 material Substances 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 230000003044 adaptive effect Effects 0.000 description 4
- 241000282412 Homo Species 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 238000013500 data storage Methods 0.000 description 3
- 230000010354 integration Effects 0.000 description 3
- 238000004519 manufacturing process Methods 0.000 description 3
- 230000003190 augmentative effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 210000003128 head Anatomy 0.000 description 2
- 230000000116 mitigating effect Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000013707 sensory perception of sound Effects 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000002238 attenuated effect Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 210000000887 face Anatomy 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000003278 mimic effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 239000011435 rock Substances 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000005309 stochastic process Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/302—Electronic adaptation of stereophonic sound system to listener position or orientation
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/50—Controlling the output signals based on the game progress
- A63F13/54—Controlling the output signals based on the game progress involving acoustic signals, e.g. for simulating revolutions per minute [RPM] dependent engine sounds in a driving game or reverberation against a virtual wall
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S7/00—Indicating arrangements; Control arrangements, e.g. balance control
- H04S7/30—Control circuits for electronic adaptation of the sound field
- H04S7/305—Electronic adaptation of stereophonic audio signals to reverberation of the listening space
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/11—Positioning of individual sound objects, e.g. moving airplane, within a sound field
Definitions
- the description generally relates to techniques for representing acoustic characteristics of real or virtual scenes.
- One example includes a method or technique that can be performed on a computing device.
- the method or technique can include generating directional impulse responses for a scene.
- the directional impulse responses can correspond to sound departing from multiple sound source locations and arriving at multiple listener locations in the scene.
- the method or technique can also include processing the directional impulse responses to obtain coherent sound signals and incoherent sound signals.
- the method or technique can also include encoding first perceptual acoustic parameters from the coherent sound signals and second perceptual acoustic parameters from the incoherent sound signals.
- the method or technique can also include outputting the encoded first perceptual acoustic parameters and the encoded second perceptual acoustic parameters.
- Another example includes a system having a hardware processing unit and a storage resource storing computer-readable instructions.
- the computer-readable instructions can cause the system to receive an input sound signal for a sound source having a source location in a scene.
- the computer-readable instructions can also cause the system to identify encoded first perceptual acoustic parameters and encoded second perceptual acoustic parameters for a listener location in the scene.
- the encoded first perceptual acoustic parameters can represent characteristics of coherent sound signals departing the source location and arriving at the listener location.
- the encoded second perceptual acoustic parameters can represent characteristics of incoherent sound signals departing the source location and arriving at the listener location.
- the computer-readable instructions can also cause the system to render coherent sound at the listener location based at least on the input sound signal and the encoded first perceptual acoustic parameters, and render incoherent sound at the listener location based at least on the input sound signal and the encoded second perceptual acoustic parameters.
- the computer-readable storage medium can store instructions which, when executed by a computing device, cause the computing device to perform acts.
- the acts can include processing directional impulse responses corresponding to sound departing from multiple sound source locations and arriving at multiple listener locations in a scene to obtain coherent sound signals and incoherent sound signals.
- the acts can also include encoding first perceptual acoustic parameters from the coherent sound signals and second perceptual acoustic parameters from the incoherent sound signals.
- the acts can also include outputting the encoded first perceptual acoustic parameters and the encoded second perceptual acoustic parameters.
- the encoded first perceptual acoustic parameters can provide a basis for subsequent rendering of coherent sound and the encoded second perceptual acoustic parameters can provide a basis for subsequent rendering of incoherent sound traveling from various source locations to various listener locations in the scene.
- FIG. 1 illustrates a scenario of acoustic probes deployed in a virtual scene, consistent with some implementations of the present concepts.
- FIGS. 2A, 2B, 2C, and 2D illustrate scenarios related to propagation of sound, consistent with some implementations of the present concepts.
- FIGS. 3 and 4 illustrate example systems that are consistent with some implementations of the present concepts.
- FIGS. 5 and 6 are flowcharts of example methods in accordance with some implementations of the present concepts.
- FIG. 7 illustrates a schematic of a streaming encoding algorithm that is consistent with some implementations of the present concepts.
- FIG. 8 illustrates an example of a sound signal split into coherent and incoherent components, consistent with some implementations of the present concepts.
- FIG. 9 illustrates adaptive time bins that can be employed to encode coherent sound components, consistent with some implementations of the present concepts.
- FIG. 10 illustrates a rendering schematic that can be employed to render sound based on encoded parameters, consistent with some implementations of the present concepts.
- modeling and rendering of real-time acoustic effects can be very computationally intensive. As a consequence, it can be difficult to render realistic acoustic effects without sophisticated and expensive hardware. For instance, modeling acoustic characteristics of a real or virtual scene while allowing for movement of sound sources and listeners presents a difficult problem, particularly for complex scenes.
- One important factor to consider in modeling acoustic effects of a scene relates to the delay of sound arriving at the listener.
- the time at which sound waves are received by a listener conveys important information to the listener. For instance, for a given wave pulse introduced by a sound source into a scene, the pressure response arrives at the listener as a series of peaks, each of which represents a different path that the sound takes from the source to the listener.
- the timing, arrival direction, and sound energy of each peak are dependent upon various factors, such the location of the sound source, the location of the listener, the geometry of structures present the scene, and the materials of which those structures are composed.
- Listeners tend to perceive the direction of the first-arriving peak in the impulse response as the arrival direction of the sound, even when nearly-simultaneous peaks arrive shortly thereafter from different directions. This is known as the “precedence effect.”
- This initial sound can take the shortest path through the air from a sound source to a listener in a given scene. After the initial sound, subsequent coherent reflections (echoes) are received that generally take longer paths reflecting off of various surfaces in the scene and become attenuated over time. In addition, humans can perceive reverberant noise together with initial sound and subsequent echoes.
- initial sounds tend to enable listeners to perceive where the sound is coming from.
- Subsequent echoes and/or reverberations tend to provide listeners with additional information about the scene because they convey how the directional impulse response travels along many different paths within the scene. For instance, echoes can be perceived differently by the user depending on properties of the scene. As an example, when a sound source and listener are nearby (e.g., within footsteps), a delay between arrival of the initial sound and corresponding first echoes can become audible. The delay between the initial sound and the echoes can strengthen the perception of distance to walls.
- Initial sound and subsequent echoes can be produced by coherent sound waves that have the same frequency and a particular phase relationship, e.g., in-phase with one another.
- reverberations can be produced by incoherent sound waves having many different frequencies and phases.
- the disclosed implementations can separate sound signal into coherent and incoherent components, and encode separate parameter sets for the coherent and incoherent components. These separate parameters provide a basis for subsequent rendering of coherent sound and incoherent sound traveling from different source locations to different listener locations within the scene.
- One high-level approach for reducing the computational burden of rendering sound involves precomputing acoustic parameters characterizing how sound travels from different source locations to different listener locations in a given virtual scene. Once these acoustic parameters are precomputed, they are invariant provided that the scene does not change.
- precompute is used to refer to determining acoustic parameters of a scene offline, while the term “runtime” refers to using those acoustic parameters during execution of an application to perform actions such as rendering sound to account for changes to source location and/or listener location.
- One simplifying assumption for parameterizing sound in a given scene is to designate different non-overlapping temporal periods for initial sound, reflections, and reverberations.
- Initial sound can be modeled as coherent sound in a first time period, coherent reflections can be modeled as coherent sound in a second time period, and reverberations can be modeled as decaying noise in a third time period.
- This approach can provide sufficient fidelity for certain applications such as video games while providing compact encodings, because initial sound and coherent reflections tend to predominate human perception of sound that arrives early at the listener for a given sound event, and reverberations tend to predominate perception of later-arriving sound.
- coherent sound signals and incoherent sound signals can have very different characteristics that vary with source and listener location in a scene.
- Approaches that extract parameters from the full impulse response pressure signal may not accurately represent different characteristics of coherent and incoherent sound signals.
- the disclosed implementations can address these issues and generate convincing sound for various applications, such as architectural acoustics, by splitting the impulse response pressure signal into separate coherent and incoherent components. Separate parameter sets can be derived for the coherent and incoherent components, thus allowing greater fidelity that allows for simultaneous rendering of coherent and incoherent sound components.
- first parameters used to represent coherent sound can be derived from a coherent sound signal component with relatively little sound energy from incoherent sound waves
- second parameters used to represent incoherent sound can be derived from an incoherent sound signal component with relatively little sound energy from incoherent sound waves.
- incoherent sound has relatively little impact on the first parameters that represent coherent sound
- coherent sound has relatively little impact on the second parameters that represent incoherent sound.
- the disclosed implementations offer computationally efficient mechanisms for accurately modeling and rendering of acoustic effects that account for different characteristics of coherent and incoherent sound in a given scene.
- the disclosed implementations can model a given scene using perceptual parameters that represent how sound is perceived at different source and listener locations within the scene. Once perceptual parameters have been obtained for a given scene as described herein, the perceptual parameters can be used for rendering of sound traveling from arbitrary source and listener positions in the scene, e.g., by interpolating stored parameters for source and listener locations that are nearby the runtime source and listener positions.
- the disclosed implementations can precompute acoustic parameters of a scene and then use the precomputed information at runtime to render sound.
- these precomputed acoustic parameters can be considered “perceptual” parameters because they describe how sound is perceived by listeners in the scene depending on the location of the sound source and listener.
- FIG. 1 shows an example of probing a scene 100. Individual probes 102(l)-102(7) are deployed throughout the scene at various locations where listeners can appear at runtime.
- simulations can be employed to model the travel of sound between selected locations in a given scene.
- sound sources can be deployed at given source locations and each probe can act as a listener at the corresponding probe location.
- sound sources can be deployed in a three-dimensional grid of square voxels of approximately one cubic meter (not shown), with one sound source per voxel.
- Simulations can be carried out for each combination of sound sources and listener probes in the scene, as described more below.
- wave simulations can be employed to model acoustic diffraction in the scene.
- the wave simulations can be used to determine how sound will be perceived by listeners at different locations in the scene depending on the location of the sound source.
- perceptual acoustic parameters can be stored representing this information.
- the perceptual acoustic parameters can include first perceptual parameters representing characteristics of coherent signals traveling from source locations to listener locations, and second perceptual parameters representing characteristics of incoherent signals traveling from the source locations to the listener locations.
- the disclosed implementations are not limited to virtual scenes.
- actual sound sources can be deployed as sound sources, with microphones acting as listeners at designated locations in the real-world scene.
- the speakers can play actual sounds that are recorded by the microphones and then the recordings can be processed to derive perceptual parameters as discussed elsewhere herein.
- each probe can be used to precompute acoustic parameters relating to different characteristics of how coherent and incoherent sound are perceived by a listener at the probed location.
- FIG. 2A illustrates a scenario that conveys certain concepts relating to travel of sound in a scene 200.
- sound is emitted by a sound source 202 and is perceived by a listener 204 based on acoustic properties of scene 200.
- scene 200 can have acoustic properties based on geometry of structures within the scene as well as materials of those structures.
- the scene can have structures such as walls 206 and 208.
- the term “geometry” can refer to an arrangement of structures (e.g., physical objects) and/or open spaces in a scene.
- the term “scene” is used herein to refer to any environment in which real or virtual sound can travel, and a “virtual” scene includes any scene with at least one virtual structure.
- structures such as walls can cause occlusion, reflection, diffraction, and/or scattering of sound, etc.
- Some additional examples of structures that can affect sound are furniture, floors, ceilings, vegetation, rocks, hills, ground, tunnels, fences, crowds, buildings, animals, stairs, etc.
- shapes e.g., edges, uneven surfaces
- materials, and/or textures of structures can affect sound.
- structures do not have to be solid objects.
- structures can include water, other liquids, and/or types of air quality that might affect sound and/or sound travel.
- the sound source 202 can generate sound pulses that create corresponding directional impulse responses.
- the directional impulse responses depend on properties of the scene 200 as well as the locations of the sound source and listener.
- the first-arriving peak in the directional impulse response is typically perceived by the listener 204 as an initial sound, and subsequent peaks in the directional impulse response tend to be perceived as echoes. Note that this document adopts the convention that the top of the page faces north for the purposes of discussing directions.
- a given sound pulse can result in many different sound wavefronts that propagate in all directions from the source.
- FIG. 2 shows three coherent sound wavefronts 210(1), 210(2), and 210(3).
- the listener perceives initial sound wavefront 210(1) as arriving from the northeast. For instance, in a virtual reality world based on scene 200, a person (e.g., listener) looking at a wall with a doorway to their right would likely expect to hear a sound coming from their right side, as wall 206 attenuates the sound energy that travels along the line of sight between the sound source and the listener.
- a person e.g., listener looking at a wall with a doorway to their right would likely expect to hear a sound coming from their right side, as wall 206 attenuates the sound energy that travels along the line of sight between the sound source and the listener.
- the sound perceived by listener 204 can also include sound wavefronts 210(2) and 210(3) after the initial sound wavefront.
- Each of these three wavefronts can include coherent sound that arrive at the user at different times and from different locations.
- coherent sound wavefronts departing from a sound source can be represented in different time bins (e.g., of monotonically increasing duration) depending on their arrival times at the listener.
- the precedence effect can be modeled by selecting the duration of the first time bin so that initial sound paths appear within the first time bin.
- FIG. 2B, 2C, and 2D illustrate sound wavefronts 210(1), 210(2), and 210(3) separately to show how they arrive at different times at the listener 204.
- FIG. 2B shows sound wavefront 210(1) arriving at the listener at about 10 milliseconds after the sound is emitted by sound source 202, as conveyed by timeline 212.
- FIG. 2C shows sound wavefront 210(2) arriving at the listener at about 30 milliseconds after the sound is emitted by sound source 202, as conveyed by timeline 212.
- FIG. 2C shows sound wavefront 210(3) arriving at the listener at about 37.5 milliseconds after the sound is emitted by sound source 202, as conveyed by timeline 212.
- One way to represent acoustic parameters in a given scene is to fix a listener location and encode parameters from different potential source locations for sounds that travel from the potential source locations to the fixed listener location. The result is an acoustic parameter field for that listener location.
- each of these fields can represent a horizontal “slice” within a given scene.
- different acoustic parameter fields can be generated for different vertical heights within a scene to create a volumetric representation of sound travel for the scene with respect to the listener location.
- the relative density of each encoded field can be a configurable parameter that varies based on various criteria, where denser fields can be used to obtain more accurate representations and sparser fields can be employed to obtain computational efficiency and/or more compact representations.
- coherent parameter fields can include total sound energy, echo count, centroid time, variance time, and directed energy parameters.
- the directed energy parameter can include a directed unit vector representing an arrival azimuth, an arrival elevation, and a vector length that is inversely related to the extent to which the sound energy is spread out in direction around the arrival azimuth.
- Incoherent parameter fields can include reverberation energy and decay time.
- source and listener locations for a runtime sound source and listener can be determined, and respective coherent and incoherent parameters determined by interpolating from the runtime source and listener locations to nearby probed listener locations and source locations on a voxel grid.
- acoustic parameters that can be encoded for various scenes. Further, note that these parameters can be simulated and precomputed using isotropic sound sources. At rendering time, sound source and listener locations can be accounted for when rendering sound. Thus, as discussed more below, the disclosed implementations offer the ability to encode perceptual parameters using isotropic sources that allow for runtime rendering of sound.
- system 300 can include a parameterized acoustic component 302.
- the parameterized acoustic component 302 can operate on a scene such as a virtual reality (VR) space 304.
- the parameterized acoustic component 302 can be used to produce realistic rendered sound 306 for the virtual reality space 304.
- functions of the parameterized acoustic component 302 can be organized into three Stages. For instance, Stage One can relate to simulation 308, Stage Two can relate to perceptual encoding 310, and Stage Three can relate to rendering 312. Stage One and Stage Two can be implemented as precompute steps, and Stage Three can be performed at runtime. Also shown in FIG.
- the virtual reality space 304 can have associated virtual reality space data 314.
- the parameterized acoustic component 302 can also operate on and/or produce directional impulse responses 316, perceptual acoustic parameters 318, and sound event input 320, which can include sound source data 322 and/or listener data 324 associated with a sound event in the virtual reality space 304.
- the rendered sound 306 can include coherent and incoherent components.
- parameterized acoustic component 302 can receive virtual reality space data 314.
- the virtual reality space data 314 can include geometry (e.g., structures, materials of objects, portals, etc.) in the virtual reality space 304.
- the virtual reality space data 314 can include a voxel map for the virtual reality space 304 that maps the geometry, including structures and/or other aspects of the virtual reality space 304.
- simulation 308 can include acoustic simulations of the virtual reality space 304 to precompute fields of coherent and incoherent acoustic parameters, such as those discussed above.
- simulation 308 can include generation of directional impulse responses 316 using the virtual reality space data 314. Pressure and three- dimensional velocity signals of the directional impulse responses 316 can be split into coherent and incoherent components, and perceptual acoustic parameters can be derived from each component. Stated another way, simulation 308 can include using a precomputed wave-based approach to capture the acoustic characteristics of a complex scene.
- directional impulse responses 316 can be generated based on probes deployed at particular listener locations within virtual reality space 304.
- Example probes are shown above in FIG. 1. This involves significantly less data storage than sampling at every potential listener location (e.g., every voxel).
- the probes can be automatically laid out within the virtual reality space 304 and/or can be adaptively sampled. For instance, probes can be located more densely in spaces where scene geometry is locally complex (e.g., inside a narrow corridor with multiple portals), and located more sparsely in a wide-open space (e.g., outdoor field or meadow).
- vertical dimensions of the probes can be constrained to account for the height of human listeners, e.g., the probes may be instantiated with vertical dimensions that roughly account for the average height of a human being.
- potential sound source locations for which directional impulse responses 316 are generated can be located more densely or sparsely as scene geometry permits. Reducing the number of locations within the virtual reality space 304 for which the directional impulse responses 316 are generated can significantly reduce data processing and/or data storage expenses in Stage One.
- perceptual encoding 310 can be performed on the directional impulse responses 316 from Stage One.
- perceptual encoding 310 can work cooperatively with simulation 308 to perform streaming encoding.
- the perceptual encoding process can receive and compress individual directional impulse responses as they are being produced by simulation 308. For instance, values can be quantized and techniques such as delta encoding can be applied to the quantized values.
- perceptual parameters tend to be relatively smooth, which enables more compact compression using such techniques. Taken together, encoding parameters in this manner can significantly reduce storage expense.
- perceptual encoding 310 can involve extracting perceptual acoustic parameters 318 from the directional impulse responses 316. These parameters generally represent how sound from different source locations is perceived at different listener locations. Example parameters are discussed above.
- the perceptual acoustic parameters for a given source/listener location pair can include first perceptual parameters representing characteristics of coherent signals traveling from source locations to listener locations, and second perceptual parameters representing characteristics of incoherent signals traveling from the source locations to the listener locations.
- Encoding perceptual acoustic parameters in this manner can yield a manageable data volume for the perceptual acoustic parameters, e.g., in a relatively compact data file that can later be used for computationally efficient rendering of coherent and incoherent sound simultaneously.
- Some implementations can also encode frequency dependence of materials of a surface that affect the sound response when a sound hits the surface (e.g., changing properties of the resultant echoes).
- rendering 312 can utilize the perceptual acoustic parameters 318 to render sound.
- the perceptual acoustic parameters 318 can be obtained in advance and stored, such as in the form of a data file.
- Sound event input 320 can be used to render sound in the scene based on the perceptual acoustic parameters as described more below.
- the sound event input 320 shown in FIG. 3 can be related to any event in the virtual reality space 304 that creates a response in sound.
- the sound source data 322 for a given sound event can include an input sound signal for a runtime sound source and a location of the runtime sound source.
- runtime sound source is used to refer to the sound source being rendered, to distinguish the runtime sound source from sound sources discussed above with respect to simulation and encoding of parameters.
- the listener data 324 can convey a location of a runtime listener.
- the term “runtime listener” is used to refer to the listener of the rendered sound at runtime, to distinguish the runtime listener from listeners discussed above with respect to simulation and encoding of parameters.
- the listener data can also convey directional hearing characteristics of the listener, e.g., in the form of a head-related transfer function (HRTF).
- HRTF head-related transfer function
- sounds can be rendered using a lightweight signal processing algorithm.
- the lightweight signal processing algorithm can render sound in a manner that can be largely computationally cost-insensitive to a number of the sound sources and/or sound events. For example, the parameters used in Stage Two can be selected such that the number of sound sources processed in Stage Three does not linearly increase processing expense.
- the sound source data for the input event can include an input signal, e.g., a time-domain representation of a sound such as series of samples of signal amplitude (e.g., 44100 samples per second).
- the input signal can have multiple frequency components and corresponding magnitudes and phases.
- the input signal can be rendered at the runtime listener location using separate parameter sets for coherent and incoherent components, as described more below.
- the parameterized acoustic component 302 can operate on a variety of virtual reality spaces 304.
- a video-game type virtual reality space 304 can be parameterized as described herein.
- virtual reality space 304 can be an augmented conference room that mirrors a real-world conference room. For example, live attendees could be coming and going from the real -world conference room, while remote attendees log in and out. In this example, the voice of a particular live attendee, as rendered in the headset of a remote attendee, could fade away as the live attendee walks out a door of the real-world conference room.
- animation can be viewed as a type of virtual reality scenario.
- the parameterized acoustic component 302 can be paired with an animation process, such as for production of an animated movie.
- virtual reality space data 314 could include geometry of the animated scene depicted in the visual frames.
- a listener location could be an estimated audience location for viewing the animation.
- Sound source data 322 could include information related to sounds produced by animated subjects and/or objects.
- the parameterized acoustic component 302 can work cooperatively with an animation system to model and/or render sound to accompany the visual frames.
- the disclosed concepts can be used to complement visual special effects in live action movies. For example, virtual content can be added to real world video images.
- a real-world video can be captured of a city scene.
- virtual image content can be added to the real-world video, such as an animated character playing a trombone in the scene.
- relevant geometry of the buildings surrounding the corner would likely be known for the post-production addition of the virtual image content.
- the parameterized acoustic component 302 can provide immersive audio corresponding to the enhanced live action movie.
- acoustic quality at each listener location e.g., seat
- a particular design can be selected that has suitable acoustic characteristics at each listener location, without needing to build a physical model or perform a full path-tracing simulation of each proposed design.
- the parameterized acoustic component 302 can model acoustic effects for arbitrarily moving listener and/or sound sources that can emit any sound signal.
- the result can be a practical system that can render convincing audio in real-time.
- the parameterized acoustic component can render convincing audio for complex scenes while solving a previously intractable technical problem of processing petabyte-scale wave fields.
- the techniques disclosed herein can handle be used to render sound for complex 3D scenes within practical RAM and/or CPU budgets.
- the result can be a practical system that can produce convincing sound for video games, virtual reality scenarios, or architectural acoustic scenarios.
- FIG. 4 shows a system 400 that can accomplish parametric encoding and rendering as discussed herein.
- system 400 can include one or more devices 402.
- the device may interact with and/or include input devices such as a controller 404, speakers 405, displays 406, and/or sensors 407.
- the sensors can be manifest as various 2D, 3D, and/or microelectromechanical systems (MEMS) devices.
- the devices 402, controller 404, speakers 405, displays 406, and/or sensors 407 can communicate via one or more networks (represented by lightning bolts 408).
- example device 402(1) is manifest as a server device
- example device 402(2) is manifest as a gaming console device
- example device 402(3) is manifest as a speaker set
- example device 402(4) is manifest as a notebook computer
- example device 402(5) is manifest as headphones
- example device 402(6) is manifest as a virtual reality device such as a headmounted display (HMD) device.
- HMD headmounted display
- device 402(2) and device 402(3) can be proximate to one another, such as in a home video game type scenario.
- devices 402 can be remote.
- device 402(1) can be in a server farm and can receive and/or transmit data related to the concepts disclosed herein.
- FIG. 4 shows two device configurations 410 that can be employed by devices 402.
- Individual devices 402 can employ either of configurations 410(1) or 410(2), or an alternate configuration.
- device configuration 410(1) represents an operating system (OS) centric configuration.
- Device configuration 410(2) represents a system on a chip (SOC) configuration.
- Device configuration 410(1) is organized into one or more application(s) 412, operating system 414, and hardware 416.
- Device configuration 410(2) is organized into shared resources 418, dedicated resources 420, and an interface 422 there between.
- OS operating system
- SOC system on a chip
- the device can include storage/memory 424, a processor 426, and/or a parameterized acoustic component 428.
- the parameterized acoustic component 428 can be similar to the parameterized acoustic component 302 introduced above relative to FIG. 3.
- the parameterized acoustic component 428 can be configured to perform the implementations described above and below.
- each of devices 402 can have an instance of the parameterized acoustic component 428.
- the functionalities that can be performed by parameterized acoustic component 428 may be the same or they may be different from one another.
- each device’s parameterized acoustic component 428 can be robust and provide all of the functionality described above and below (e.g., a device-centric implementation).
- some devices can employ a less robust instance of the parameterized acoustic component that relies on some functionality to be performed remotely.
- the parameterized acoustic component 428 on device 402(1) can perform functionality related to Stages One and Two, described above for a given application, such as a video game or virtual reality application.
- the parameterized acoustic component 428 on device 402(2) can communicate with device 402(1) to receive perceptual acoustic parameters 318.
- the parameterized acoustic component 428 on device 402(2) can utilize the perceptual parameters with sound event inputs to produce rendered sound 306, which can be played by speakers 405(1) and 405(2) for the user.
- the sensors 407 can provide information about the location and/or orientation of a user of the device (e.g., the user’s head and/or eyes relative to visual content presented on the display 406(2)).
- the location and/or orientation can be used for rendering sounds to the user by treating the user as a listener or, in some cases, as a sound source.
- a visual representation e.g., visual content, graphical use interface
- the visual representation can be based at least in part on the information about the location and/or orientation of the user provided by the sensors.
- the parameterized acoustic component 428 on device 402(6) can receive perceptual acoustic parameters from device 402(1).
- the parameterized acoustic component 428(6) can produce rendered sound in accordance with the representation.
- stereoscopic sound can be rendered through the speakers 405(5) and 405(6) representing how coherent and incoherent sound are perceived at the location of the user.
- Stage One and Two described above can be performed responsive to inputs provided by a video game, a virtual reality application, or an architectural acoustics application.
- the output of these stages e.g., perceptual acoustic parameters 318
- the plugin can apply the perceptual parameters to the sound event to compute the corresponding rendered sound for the sound event.
- the video game, virtual reality application, or architectural acoustics application can provide sound event inputs to a separate rendering component (e.g., provided by an operating system) that renders sound on behalf of the video game, virtual reality application, or architectural acoustics application.
- the disclosed implementations can be provided by a plugin for an application development environment.
- an application development environment can provide various tools for developing video games, virtual reality applications, and/or architectural acoustic applications. These tools can be augmented by a plugin that implements one or more of the stages discussed above.
- an application developer can provide a description of a scene to the plugin and the plugin can perform the disclosed simulation techniques on a local or remote device, and output encoded perceptual parameters for the scene.
- the plugin can implement scene-specific rendering given an input sound signal and information about runtime source and listener locations.
- the term “device,” “computer,” or “computing device” as used herein can mean any type of device that has some amount of processing capability and/or storage capability. Processing capability can be provided by one or more processors that can execute computer-readable instructions to provide functionality. Data and/or computer-readable instructions can be stored on storage, such as storage that can be internal or external to the device.
- the storage can include any one or more of volatile or non-volatile memory, hard drives, flash storage devices, and/or optical storage devices (e.g., CDs, DVDs etc.), remote storage (e.g., cloud-based storage), among others.
- the term “computer-readable media” can include signals. In contrast, the term “computer-readable storage media” excludes signals.
- Computer-readable storage media includes “computer-readable storage devices.” Examples of computer-readable storage devices include volatile storage media, such as RAM, and non-volatile storage media, such as hard drives, optical discs, and flash memory, among others.
- device configuration 410(2) can be thought of as a system on a chip (SOC) type design.
- SOC system on a chip
- functionality provided by the device can be integrated on a single SOC or multiple coupled SOCs.
- One or more processors 426 can be configured to coordinate with shared resources 418, such as storage/memory 424, etc., and/or one or more dedicated resources 420, such as hardware blocks configured to perform certain specific functionality.
- shared resources 418 such as storage/memory 424, etc.
- dedicated resources 420 such as hardware blocks configured to perform certain specific functionality.
- the term “processor” as used herein can also refer to central processing units (CPUs), graphical processing units (GPUs), field programmable gate arrays (FPGAs), controllers, microcontrollers, processor cores, or other types of processing devices.
- CPUs central processing units
- GPUs graphical processing units
- FPGAs field programmable gate arrays
- controllers microcontrollers
- processor cores or other types of
- any of the functions described herein can be implemented using software, firmware, hardware (e.g., fixed-logic circuitry), or a combination of these implementations.
- the term “component” as used herein generally represents software, firmware, hardware, whole devices or networks, or a combination thereof. In the case of a software implementation, for instance, these may represent program code that performs specified tasks when executed on a processor (e.g., CPU or CPUs).
- the program code can be stored in one or more computer-readable memory devices, such as computer-readable storage media.
- the features and techniques of the component are platform-independent, meaning that they may be implemented on a variety of commercial computing platforms having a variety of processing configurations.
- method 500 can receive virtual reality space data corresponding to a virtual reality space.
- the virtual reality space data can represent a geometry of the virtual reality space.
- the virtual reality space data can describe structures, such as walls, floors, ceilings, etc.
- the virtual reality space data can also include additional information related to the geometry, such as surface texture, material, thickness, etc.
- method 500 can use the virtual reality space data to generate directional impulse responses for the virtual reality space.
- method 500 can generate the directional impulse responses by simulating initial sounds emanating from multiple moving sound sources and/or arriving at multiple moving listeners.
- method 500 can process the directional impulse responses to obtain coherent sound signals and incoherent sound signals.
- the directional impulse response signal can be split by applying a scalar weighting value a, ranging between zero and 1, to each sample of the directional impulse response signal, as described further below.
- method 500 can encode first perceptual parameters from the coherent signals and second conceptual parameters from the incoherent signals as described more below.
- method 500 can output the encoded perceptual parameters. For instance, method 500 can output the encoded perceptual parameters on storage, over a network, via shared memory to an application process, etc.
- method 600 can receive an input sound signal for a sound source having a corresponding runtime source location in a scene.
- the input sound signal can be time-domain representation of a sound that has multiple frequency components and corresponding magnitudes and phases.
- method 600 can identify encoded first perceptual parameters and encoded second perceptual parameters for a runtime listener location in the scene.
- the encoded first perceptual parameters can represent characteristics of coherent signals departing the source location and arriving at the listener location.
- the encoded second perceptual parameters can represent characteristics of incoherent signals departing the source location and arriving at the listener location.
- the encoded perceptual parameters can be interpolated to accommodate for differences between the runtime source location and runtime listener location and the source/listener locations for which the encoded parameters were simulated.
- method 600 can use the input sound signal and the encoded first perceptual parameters to render coherent sound at the listener location.
- the method 600 can use the input sound signal and the encoded second perceptual parameters to render incoherent sound at the listener location.
- blocks 606 and 608 can be performed concurrently.
- the sound perceived by the listener can include coherent and incoherent components that are perceived simultaneously by the listener.
- parameters can be encoded for various potential listener probe locations in a given scene.
- a volumetric wave simulation can be performed from that probe, providing both the scalar pressure, p and 3D particle velocity, v at a dense set of points in 3D space.
- the generation of a simulated impulse response can proceed in time-steps.
- a perceptual encoding component can extract salient perceptual properties from d(t; x) as a compact set of encoded parameters at each cell yielding a set of parameter fields P(x). These parameter fields can be concatenated over listener probes, stored in a data file that is loaded at runtime, and rendered in real-time.
- a set of encoded parameters P can be determined for each listener location.
- the parameters can be encoded such that salient perceptual aspects of acoustics such as directional echoes and reverberation are captured while employing a compact memory budget.
- P can be extracted in an efficient, streaming fashion.
- the signal-processing algorithm that performs the encoding d(t) -> P can minimize or reduce storage of past history. Failing to do so could exceed the RAM budget on a desktop machine or cloud virtual machine where simulation is performed during offline computation.
- the disclosed techniques can be employed for extracting perceptually-salient aspects of directional acoustic impulse responses.
- the disclosed techniques can enable high-quality yet real-time rendering of audio-visual scenes in gaming, mixed reality and engineering applications such as architecture.
- the following concepts can be employed:
- Streaming coherent-incoherent splitting allows separate perceptual assumptions that can be compactly encoded into two components.
- a coherent (echo) component contains strong reflections.
- An incoherent (noise) component contains reverberation.
- the disclosed implementations can split an incoming directional impulse response, sample-by-sample, into two signals: the coherent component isolates echoes as peaked arrivals in time and direction, whereas the incoherent component contains reverberation with numerous arrivals with similar amplitudes that the brain combines into an overall perception.
- Human auditory perception is sensitive to these two distinct aspects of directional impulse responses, and thus these components can be represented by separate perceptual parameter sets.
- splitting is performed while encoding complex directional impulse responses measured in the real world or accurately simulated with numerical solvers that closely mimic the real world.
- the directional impulse responses generated in the real world or using a numerical solver do not separate coherent and incoherent components of a sound signal.
- the disclosed implementations can be employed to successfully separate these two components using statistical measures.
- the disclosed implementations prove to provide for streaming, on-the-fly encoding of directional impulse responses with knowledge of only the current value and a limited history of past values.
- FIG. 7 shows a schematic of an encoder 700.
- a splitter component 702 determines a degree of incoherence value: ⁇ z(t) G [0,1] for each time-sample. When the signal looks like reverberation, this value approaches 1, when it contains one or few outlying large value (echoes), it approaches 0.
- the splitter component is allowed to keep internal history of past input values.
- FIG. 8 An example result of splitting a signal is shown in FIG. 8, with an input pressure response graph 802, degree of incoherence graph 804, coherent component graph 806, and incoherent component graph 808. Note that each signal also can include three velocity component signals that are not shown in FIG. 8.
- this measure is similar to spectral flatness measures that can be employed in speech and music processing literature on frequency-domain data.
- the flatness measure of equation (2) is a general measure of how random/stochastic the signal looks and can be applied to the energy envelope of an impulse response, since reverberation can be well-modeled as a stochastic process.
- this low-pass filter is implemented via summing over a history buffer of 10ms duration, but like the energy envelope, a recursive filter implementation could be employed instead.
- encoder 700 can detect the onset time, T 0 when the first peak of a given impulse response first peak occurs.
- FIG. 9 illustrates an example of an adaptive time-binning approach with time bins 902(1) through 902(16). This adaptivity is based on observations about auditory perception, such as the precedence effect. Spatial localization has a fusion interval of 1ms after onset. Reflections for signals such as clicks, speech, and music can be perceptually fused with the onset over an interval of 10ms, 50ms, and 80ms respectively.
- a suitably increasing size for binning can capture most of these salient aspects, while requiring far less memory than a constantsized bin width (e.g., 10 ms).
- a constant-sized bin width is employed in audio coding applications such as mp3 because there is no special start time - transients can occur at any time in an audio recording, for any number of sounds.
- Impulse responses are different in that there is a special onset time, and thus strong perceptual assumptions can be made starting at that time, in particular that information closer to the onset time carries higher salience. This allows the disclosed implementations to compactly encode directional impulse responses.
- This formulation is one of many possible measures of signal sparsity. In the disclosed implementations, it provides a direct notion of the number of echoes within a time bin for complex responses, while mitigating expensive and error-prone processing for fitting peaks. This approach also has the advantage that it is easily streamed relative to approaches that involve building a histogram. The number of echoes can convey the distinction between, e.g., an empty unfurnished room which will have energy concentrated in a few echoes, versus a furnished room where energy gets distributed into many more echoes due to repeated scattering.
- Eb u spread of arriving energy around the centroid time. The value is small when there is a single, crisp peak, or many peaks clustered close together, and larger when there are similarly-energetic arrivals spread throughout the bin’s duration.
- Directed energy: D b The corresponding direction unit vector: D b / ⁇ D b ⁇ is the centroid direction from which energy arrives within this time bin.
- Curves 904(1) ... 904(7) illustrate encoded centroid time, C b , and variance time, S b for corresponding bins 902(1) ... 902(7), while curves 906(1) ... 906(7) represent the corresponding coherent signal for each time bin. Note that curves 904 and 906 are not separately labeled for time bins 902(8) ... 902(12). Curves 904(1) ... (7) can be obtained by synthesizing an equivalent Gaussian function that would encode to the same parameters. This illustrates that an exact waveform fitting is not necessarily employed, but rather the aggregate character of when and which direction energy arrives at the listener is obtained. Each of the parameters above is computable in a streaming fashion via accumulators that compute a running sum that approximates the integral terms.
- each bin occurs successively in time; thus, once a given bin’s end time has been reached, the parameters for the bin may be computed, stored, and accumulators for the integrals reset for the next bin. This limits the memory utilization of the disclosed encoding process. Further, post-processing operators such as the division by E b , or squaring in point 2, as applicable, can be applied during this parameter extraction step when the end time for a given bin is reached.
- Schroeder backward integration causes underestimation issues since backward integration in dB domain dips towards — oo at the end, so manual inspection of the curve is used to fix the line-fitting interval.
- Backward integration also involves processing the complete response and thus is not suited for streaming.
- the disclosed streaming method overcomes these deficiencies with relatively little compute per time-step.
- the exponential model can be fit by solving the following relations for the unknown quantities ⁇ E o ,(3 ⁇ ,
- the above relations are solved numerically.
- One solution technique is to observe that in the asymptotic limit T sim -> oo, thus obtaining: C r -> 1//?.
- This set of parameters can be extended to each of six axial directions, and can also be extended by encoding different decay times for different frequency bands.
- FIG. 10 illustrates an example Tenderer 1000 that can be employed to render the encoded parameters for a single input sound source.
- the functionality of Tenderer 1000 is described below.
- the disclosed encoding process results in two sets of parameters.
- the number of bins is implementation dependent based on how fast the bin size increases, and up to what time the coherent component is encoded. A value of approximately a few hundred milliseconds may be safely above the mixing time of typical spaces of interest.
- the parameters can be decoded at runtime by performing lookups into a dataset based on the current source and listener locations.
- Vector parameters such as Dj, or D r can be treated as three scalar parameters for the respective Cartesian components.
- Renderer 1000 can invoke a few abstract functions as follows:
- Shaped Velvet noise V. Generate N b random time samples within the time bin with the samples drawn from a Gaussian probability distribution function whose standard deviation is determined by the encoded variance S b . Each is then assigned a random sign of +/-1, with amplitude governed by some smooth shaping curve within the time bin.
- the resulting pressure signal V t; N b ,S b ') is such that, if encoded as described above, the echo count N b and variance time S b can be recovered. Large latitude is allowed in the precise signal generated at this stage, as long as it satisfies the two parameters, thus allowing selection of signals that are perceptually convincing and efficient to render.
- Decaying gaussian noise, Q. This can be defined as (j(t; T 60 ) i](t)exp(— 61ogl0 t/T ⁇ where i](t) is Gaussian noise with variance of 1.
- Spatialization, Jf An abstract spatializer [£)] that takes a monophonic input signal, and depending on the direction parameter D (D b or D r as the case may be) computes multi-channel spatial audio signals that create the impression of the sound arriving from centroid direction D/ ⁇ D
- an HRTF spatializer may be used to produce binaural stereo signals for headphone playback.
- can be used to compute the equivalent angle of a cone with the centroid direction as central axis. Assuming equal-probability of energy arriving over a cone to result in the observed value, the equivalent cone angle 0 D obeys:
- a precomputed lookup table can be employed to solve equation (4). If the spatializer expects direction and spread as separate parameters, they may be then equivalently provided as [F>/
- the description relates to parameterize encoding and rendering of sound.
- the disclosed techniques and components can be used to create accurate acoustic parameters for video game scenes, virtual reality scenes, architectural acoustic scenes, or other applications.
- the parameters can be used to render sound with higher fidelity, more realistic sound than available through other sound modeling and/or rendering methods.
- the parameters can be stored and employed for rendering within reasonable processing and/or storage budgets.
- the methods described above and below can be performed by the systems and/or devices described above, and/or by other devices and/or systems.
- the order in which the methods are described is not intended to be construed as a limitation, and any number of the described acts can be combined in any order to implement the methods, or an alternate method(s).
- the methods can be implemented in any suitable hardware, software, firmware, or combination thereof, such that a device can implement the methods.
- the method or methods are stored on computer-readable storage media as a set of computer-readable instructions such that execution by a computing device causes the computing device to perform the method(s).
- One example includes a method comprising generating directional impulse responses for a scene, the directional impulse responses corresponding to sound departing from multiple sound source locations and arriving at multiple listener locations in the scene, processing the directional impulse responses to obtain coherent sound signals and incoherent sound signals, encoding first perceptual acoustic parameters from the coherent sound signals and second perceptual acoustic parameters from the incoherent sound signals, and outputting the encoded first perceptual acoustic parameters and the encoded second perceptual acoustic parameters.
- Another example can include any of the above and/or below examples where the encoding comprises generating first acoustic parameter fields of the first perceptual acoustic parameters and second acoustic parameter fields of the second perceptual acoustic parameters, each first acoustic parameter field having a set of first perceptual acoustic parameters representing characteristics of the coherent sound signals arriving at a particular listener location from the multiple sound source locations, and each second acoustic parameter field having a set of second perceptual acoustic parameters representing characteristics of the incoherent sound signals arriving at the particular listener location from the multiple sound source locations.
- processing the directional impulse responses comprises splitting pressure signals and velocity signals of the directional impulse responses to obtain the coherent sound signals and the incoherent sound signals.
- Another example can include any of the above and/or below examples where the method further comprises determining scalar values for respective samples of the pressure signals, the scalar values characterizing incoherence of the respective samples and modifying the respective samples of the pressure signals based at least on the scalar values to extract the coherent sound signals and the incoherent sound signals.
- Another example can include any of the above and/or below examples where the encoding comprises determining the first perceptual acoustic parameters for a plurality of time bins.
- Another example can include any of the above and/or below examples where the time bins have monotonically increasing durations.
- Another example can include any of the above and/or below examples where the first perceptual acoustic parameters for a particular time bin include a total sound energy of the particular time bin.
- Another example can include any of the above and/or below examples where the first perceptual acoustic parameters for a particular time bin include an echo count for the particular time bin, the echo count representing a number of coherent sound reflections in the particular time bin.
- Another example can include any of the above and/or below examples where the first perceptual acoustic parameters for a particular time bin include a centroid time for the particular time bin, the centroid time representing a particular time in the particular time bin where peak sound energy is present.
- Another example can include any of the above and/or below examples where the first perceptual acoustic parameters for a particular time bin include a variance time for the particular time bin, the variance time representing the extent to which sound energy is spread out in time within the particular time bin.
- Another example can include any of the above and/or below examples where the first perceptual acoustic parameters for a particular time bin include a directed energy parameter representing an arrival direction of sound energy at the listener location.
- Another example can include any of the above and/or below examples where the directed energy parameter includes an arrival azimuth, an arrival elevation, and a vector length.
- Another example can include any of the above and/or below examples where the vector length of the directed energy parameter is inversely related to the extent to which the sound energy is spread out in direction around the arrival azimuth.
- Another example can include any of the above and/or below examples where the method further comprises determining a centroid time for the incoherent sound signals and determining reverberation energy and decay time based at least on the centroid time, the encoded second perceptual parameters including the reverberation energy and the decay time.
- Another example can include a system comprising a processor and storage storing computer- readable instructions which, when executed by the processor, cause the system to receive an input sound signal for a sound source having a source location in a scene, identify encoded first perceptual acoustic parameters and encoded second perceptual acoustic parameters for a listener location in the scene, the encoded first perceptual acoustic parameters representing characteristics of coherent sound signals departing the source location and arriving at the listener location, the encoded second perceptual acoustic parameters representing characteristics of incoherent sound signals departing the source location and arriving at the listener location, render coherent sound at the listener location based at least on the input sound signal and the encoded first perceptual acoustic parameters, and render incoherent sound at the listener location based at least on the input sound signal and the encoded second perceptual acoustic parameters.
- Another example can include any of the above and/or below examples where the computer- readable instructions, when executed by the processor, cause the system to spatialize the coherent sound and the incoherent sound at the listener location.
- Another example can include any of the above and/or below examples where the computer- readable instructions, when executed by the processor, cause the system to render the coherent sound using shaped noise based at least on an echo count parameter obtained from the encoded first perceptual acoustic parameters.
- Another example can include any of the above and/or below examples where the computer- readable instructions, when executed by the processor, cause the system to render the coherent sound using shaped noise based at least on a variance time parameter obtained from the encoded first perceptual acoustic parameters.
- Another example can include any of the above and/or below examples where the computer- readable instructions, when executed by the processor, cause the system to render the incoherent sound using Gaussian noise based at least on reverberation energy and decay time parameters obtained from the encoded second perceptual acoustic parameters.
- Another example can include a computer-readable storage medium storing computer-readable instructions which, when executed, cause a processor to perform acts comprising processing directional impulse responses corresponding to sound departing from multiple sound source locations and arriving at multiple listener locations in a scene to obtain coherent sound signals and incoherent sound signals, encoding first perceptual acoustic parameters from the coherent sound signals and second perceptual acoustic parameters from the incoherent sound signals, and outputting the encoded first perceptual acoustic parameters and the encoded second perceptual acoustic parameters, the encoded first perceptual acoustic parameters providing a basis for subsequent rendering of coherent sound and the encoded second perceptual acoustic parameters providing a basis for subsequent rendering of incoherent sound traveling from various source locations to various listener locations in the scene.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Stereophonic System (AREA)
Abstract
La description concerne la représentation de caractéristiques acoustiques de scènes réelles ou virtuelles. Un procédé consiste à générer des réponses impulsionnelles directionnelles pour une scène. Les réponses impulsionnelles directionnelles peuvent correspondre à un son partant de multiples emplacements de source sonore et arrivant au niveau de multiples emplacements d'écoute dans la scène. Le procédé peut consister à traiter les réponses impulsionnelles directionnelles pour obtenir des signaux sonores cohérents et des signaux sonores incohérents. Le procédé peut également consister à coder des premiers paramètres acoustiques perceptifs à partir des signaux sonores cohérents et des seconds paramètres acoustiques perceptifs à partir des signaux sonores incohérents, et à délivrer les premiers paramètres acoustiques perceptifs codés et les seconds paramètres acoustiques perceptifs codés.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202163285873P | 2021-12-03 | 2021-12-03 | |
US17/565,878 US11877143B2 (en) | 2021-12-03 | 2021-12-30 | Parameterized modeling of coherent and incoherent sound |
PCT/US2022/048640 WO2023101786A1 (fr) | 2021-12-03 | 2022-11-02 | Modélisation paramétrée de son cohérent et incohérent |
Publications (1)
Publication Number | Publication Date |
---|---|
EP4442009A1 true EP4442009A1 (fr) | 2024-10-09 |
Family
ID=84439737
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP22818955.1A Pending EP4442009A1 (fr) | 2021-12-03 | 2022-11-02 | Modélisation paramétrée de son cohérent et incohérent |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP4442009A1 (fr) |
WO (1) | WO2023101786A1 (fr) |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019054559A1 (fr) * | 2017-09-15 | 2019-03-21 | 엘지전자 주식회사 | Procédé de codage audio auquel est appliqué un paramétrage brir/rir, et procédé et dispositif de reproduction audio utilisant des informations brir/rir paramétrées |
US11595773B2 (en) * | 2019-08-22 | 2023-02-28 | Microsoft Technology Licensing, Llc | Bidirectional propagation of sound |
-
2022
- 2022-11-02 EP EP22818955.1A patent/EP4442009A1/fr active Pending
- 2022-11-02 WO PCT/US2022/048640 patent/WO2023101786A1/fr unknown
Also Published As
Publication number | Publication date |
---|---|
WO2023101786A1 (fr) | 2023-06-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112106385B (zh) | 一种用于声音建模和呈现的系统 | |
Raghuvanshi et al. | Parametric directional coding for precomputed sound propagation | |
Cuevas-Rodríguez et al. | 3D Tune-In Toolkit: An open-source library for real-time binaural spatialisation | |
US10248744B2 (en) | Methods, systems, and computer readable media for acoustic classification and optimization for multi-modal rendering of real-world scenes | |
Schissler et al. | Efficient HRTF-based spatial audio for area and volumetric sources | |
TWI843757B (zh) | 用於針對耳機上空間音訊渲染修改場所特性之系統和方法 | |
Li et al. | Scene-aware audio for 360 videos | |
CN106465037B (zh) | 用于动态源的实时声音传播的参数波场编码 | |
US11412340B2 (en) | Bidirectional propagation of sound | |
Hulusic et al. | Acoustic rendering and auditory–visual cross‐modal perception and interaction | |
Lokki et al. | Creating interactive virtual auditory environments | |
Rosen et al. | Interactive sound propagation for dynamic scenes using 2D wave simulation | |
US11595773B2 (en) | Bidirectional propagation of sound | |
Schissler et al. | Efficient construction of the spatial room impulse response | |
EP4335119A1 (fr) | Modélisation des effets acoustiques de scènes avec des portails dynamiques | |
US10911885B1 (en) | Augmented reality virtual audio source enhancement | |
Chen et al. | Real acoustic fields: An audio-visual room acoustics dataset and benchmark | |
Raghuvanshi et al. | Interactive and Immersive Auralization | |
Ratnarajah et al. | Listen2Scene: Interactive material-aware binaural sound propagation for reconstructed 3D scenes | |
US11877143B2 (en) | Parameterized modeling of coherent and incoherent sound | |
EP4442009A1 (fr) | Modélisation paramétrée de son cohérent et incohérent | |
Thery et al. | Impact of the visual rendering system on subjective auralization assessment in VR | |
Suarez et al. | A comparison between measured and modelled head-related transfer functions for an enhancement of real-time 3d audio processing for virtual reality environments | |
Foale et al. | Portal-based sound propagation for first-person computer games | |
Dias et al. | 3D reconstruction and spatial auralization of the Painted Dolmen of Antelas |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20240522 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC ME MK MT NL NO PL PT RO RS SE SI SK SM TR |