WO2016077514A1

WO2016077514A1 - Ear centered head related transfer function system and method

Info

Publication number: WO2016077514A1
Application number: PCT/US2015/060259
Authority: WO
Inventors: David S. Mcgrath; Rhonda Wilson
Original assignee: Dolby Laboratories Licensing Corporation
Priority date: 2014-11-14
Filing date: 2015-11-12
Publication date: 2016-05-19

Abstract

A method of creating a series of head related transfer functions for the playback of audio signals, the method including the steps of: (a) for at least one intended listener's ear of playback, and for at least one externally positioned audio source, formulating at least one normalized ear centered HRTF having substantially invariant characteristics along a radial line from the listener's ear; (b) modifying the normalized ear centered HRTF by a delay factor and an attenuation factor in accordance with the distance from a listener's ear.

Description

Ear Centered Head Related Transfer Function System and Method

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001 ] This application claims priority to United States Provisional Patent Application No. 62/079,648 filed 14 November 2014, which is hereby incorporated by reference in its entirety.

FIELD OF THE INVENTION

[0002] The present invention relates to the field of audio signal processing and in particular discloses a head related transfer function processing system and method for the spatialization of audio. BACKGROUND

[0003] Any discussion of the background art throughout the specification should in no way be considered as an admission that such art is widely known or forms part of common general knowledge in the field.

[0004] Audio processing systems for the processing of audio signals for playback over headphones or the like for the purposes of externalising the audio sources to the listener are well-known.

[0005] One popular product is the Dolby headphone product (Trade Mark) which addresses the external spatialization problem. A description of one form of spatialization product is contained in United States Patent 6259795 entitled "Methods and apparatus for processing spatialized audio" the contents of which are incorporated here with. Many other examples of audio spatialization systems exist. For example, United States Patents 8270616 and 6421446.

[0006] Normally, external audio spatialization systems rely upon applying head related transfer functions (HRTF) to an input audio signal to produce the left and right audio signals which account for the processing that the ear canal would carry out on the input audio signal if it was projected from an external source. [0007] The Head Related Transfer Functions (HRTFs) are pairs of filters (referred to as a left-right pair) that are intended to mimic the way the sound of an audio object is modified as it propagates from the audio object to the left and right ears of a listener.

[0008] For the purposes of discussion, it is possible to define the following

nomenclature for these filters, in the form of impulse responses:

[0009] LeftEar : A_L(v , i) v e P ³,i e P (1)

[0010] RightEar : /¾(v , i) v e P ³ , i e P (2)

[001 1 ] The audio object location can be defined as, v = (x, y, z) and is defined in terms of a coordinate system where the origin of the coordinates is positioned relative to the midpoint between the listeners ears (the point, inside the listener's head, that is midway between the opening of the left and right ear-canals). Fig. 1 illustrates an example coordinate system 1 where the origin 2 is located halfway between the listener's ears.

[0012] It is desirable to provide as superior a playback system as possible for playback of audio to a listener. SUMMARY OF THE INVENTION

[0013] It is an object of the invention to provide an improved system and method for processing head related transfer functions, for the external spatialization of audio.

[0014] In accordance with a first aspect of the present invention, there is provided a method of creating a series of head related transfer functions for the playback of audio signals, the method including the steps of: (a) for at least one intended listener's ear of playback, and for at least one externally positioned audio source, formulating at least one normalized ear centered HRTF having substantially invariant characteristics along a radial line from the listener' s ear; (b) modifying the normalized ear centered HRTF by a delay factor and an attenuation factor in accordance with a distance measure from at least one of the listener's ears. [0015] In accordance with a further aspect of the present invention, there is provided a method of spatializing an audio input signal so that it has an apparent external position when played back over headphone transducers, the method including the steps of: (a) initially forming a normalised HRTF for an audio input signal located at an external position relative to a listener, the HRTF being substantially invariant along a radial line from the listener' s ear; (b) further modifying the normalised HRTF by a delay and attenuation factor in accordance with the distance of the audio source from the listener to produce a resulting ear centered HRTF; (c) utilising the ear centered HRTF to filter the audio input signal to produce an output stream which approximates the effect of projection of the audio to an ear of the listener.

[0016] The step (c) preferably can include convolution of the ear centered HRTF with the audio signal.

[0017] In accordance with a further aspect of the present invention, there is provided a method of spatializing an input audio stream to produce a spatialized audio output stream for playback over audio transducers placed near a listener's ears, the method including the steps of: (a) forming a left ear centered intermediate HRTF having substantially invariant characteristics along a radial line centered at an intended listener' s left ear; (b) delaying and attenuating the left ear centered intermediate HRTF in accordance with an intended distance measure of the input audio stream from a listener's ear to produce a left scaled HRTF; (c) combining the left scaled HRTF with the input audio stream to produce a left audio output stream signal; (d) forming a right ear centered intermediate HRTF having substantially invariant characteristics along a radial line centered at an intended listener's right ear; (e) delaying and attenuating the right ear centered intermediate HRTF in accordance with an intended distance measure of the input audio stream from a listener's ear to produce a right scaled HRTF; (f) combining the right scaled HRTF with the input audio stream to produce a right audio output stream signal; and (g) outputting the left and right audio output stream signals as a spatialized audio output stream.

[0018] The steps (c) and (f) of combining can comprise convolving the corresponding HRTF with the input audio stream. [0019] In accordance with a further aspect of the present invention, there is provided a method of creating at least a first HRTF impulse response for a sound emitter at a specified location, for at least a first ear of a virtual listener, the method including the steps of: (a) determining the location of the sound emitter relative to the first ear of the virtual listener; (b) determining a first ear relative direction of arrival, and a first ear relative distance of the sound emitter; (c) determining a first ear centered HRTF for the sound emitter, based on the first ear relative direction of arrival; and (d) forming the first HRTF impulse response from the first ear centered HRTF, including adding a delay to the first ear centered HRTF derived from a first ear relative distance and also including a gain applied to the first ear centered HRTF according to the first ear relative distance.

[0020] The method can optionally calculate the delay and gain by including a first and second ear relative distance.

[0021 ] In accordance with a further aspect of the present invention, there is provided a method of formulating a first ear centered HRTF impulse response for a sound emitter at a predetermined location relative to a first ear, the method including the steps of: (a) determining a first ear relative direction of arrival of the sound emitter, relative to the first ear of the virtual listener; (b) determining an undelayed ear centered HRTF impulse response, as a parameterised function of the first ear relative direction of arrival; (c) determining a head-shadow-delay, as a parameterised function of the first ear relative direction of arrival; (d) forming the first ear centered HRTF impulse response from the undelayed ear centered HRTF impulse response by the addition of the head-shadow-delay.

[0022] The methods can be applied substantially symmetrically for a first and second ear of a listener.

[0023] In accordance with a further aspect of the present invention, there is provided a method of spatializing an audio input signal so that it has an apparent external position when played back over headphone transducers, the method including the steps of: (a) forming a series of prototype normalised HRTFs for an audio input signal located at a series of external positions relative to a listener, the prototype normalised HRTFs being substantially invariant along a radial line from the listener's ear; (b) utilising a series of interpolation functions for interpolating between the series of prototype normalised HRTFs in accordance with an apparent external position relative to the listener, so as to form an undelayed ear centered HRTF; (c) calculating a delay and gain factor from the radial distance to the apparent external position, and applying the delay and gain factor to the undelayed ear centered HRTF to produce an ear centered HRTF; (d) utilising the ear centered HRTF to filter the audio input signal to produce an output stream which approximates the effect of projection of the audio to an ear of a listener.

[0024] The series of interpolation functions can comprise a series of polynomials. The series of polynomials are preferably defined in terms of a Cartesian coordinate system centered around the listener.

[0025] The method can be utilized to form both a left and right channel signal for a listener and the same series of prototype normalised HRTFs are preferably used for each ear of the listener.

[0026] The prototype normalised HRTFs are preferably stored utilising a high sample rate and the utilising step (d) preferably can include subsampling the prototype normalised HRTFs to filter them with the audio signal.

[0027] In accordance with a further aspect of the present invention, there is provided a method of spatializing a series (M) of audio input signal objects each having an apparent external position so that the signals maintain an apparent external position when played back over headphone transducers, the method including the steps of: (a) for each of the M audio input signal objects: (i) Computing a total delay and gain to be applied to a left-ear HRTF; (ii) Applying the delay and gain to the input audio signal object to produce a first ear delayed signal for the object; (iii) Interpolating a series of polynomials to produce a series (N) of scale factors and scaling the first ear delayed signal for the object, to produce N first ear delayed scaled signals for the object (b) producing combined first-ear-delayed- scaled signals, such that each of the combined first-ear-delayed-scaled signals is formed by summing together the corresponding first-ear-delayed-scaled signals for the objects; (c) filtering the combined first-ear-delayed-scaled signals through a corresponding series of prototype filters and summing the outputs of these filters to produce a first output signal for playback to a first ear.

[0028] In accordance with a further aspect of the present invention there is provided an apparatus for implementing the methods described above. In accordance with a further aspect of the present invention there is provided an computer readable storage medium storing a program of instructions that is executable by a device to perform the methods described above.

BRIEF DESCRIPTION OF THE DRAWINGS

[0029] Embodiments of the invention will now be described, by way of example only, with reference to the accompanying drawings in which: [0030] Fig. 1 illustrates schematically the coordinate system with the origin at a point midway between the ears;

[0031 ] Fig. 2 illustrates schematically the coordinate system with the origin at listener's left ear;

[0032] Fig. 3 illustrates schematically a top view of a listener and audio object; [0033] Fig. 4 illustrates the process for the generation of Absolute HRTFs;

[0034] Fig. 5 illustrates the process for generation of Normalized HRTFs;

[0035] Fig. 6 illustrates the process for front-end ear-centered processing, computing intermediate coefficients;

[0036] Fig. 7 illustrates the process of back-end ear-centered processing with FIR filters implemented for each object, m;

[0037] Fig. 8 illustrates the back-end ear-centered centered processing with FIR filters operating on the results after summation of m objects; and

[0038] Fig. 9 illustrates an example rendering system utilising HRTFs generated using the embodiments of the invention. DETAILED DESCRIPTION

[0039] One embodiment provides a system for formulating HRTF transfer functions from the audio object to each ear. [0040] The embodiment proceeds by noting that as an audio object moves along a trajectory on a straight line that passes through say the listener's left ear, the transfer function (HRTF) from the audio object to the left ear will vary primarily according to a time delay ( delay _L =—) and a gain ( gain_L =— ), with other distance-related variations c d_L

being perceptually less significant. [0041 ] In order to properly account for the audio object, it is important to note that the coordinate system of Fig. 1 is not the only coordinate system that may be useful. The postion of the audio object may be defined in terms of ear centered coordinates and unit vectors. For example, the audio object may also be defined in terms of it's location in two alternative coordinate systems: [0042] v_L = (x_L , y_L, z_L) '■ The location of the object, relative to the listener's left ear

[0043] v_R = (x_R , y_R , z_R ) : The location of the object, relative to the listener's right ear.

[0044] Fig. 2 illustrates an alternative coordinate system 20 which is centered 21 on the listeners left ear.

[0045] Turning to Fig. 3 there is illustrated a top view 30 of a listener 31 and audio object 32. If the distance between the listener's ears (the diameter of the listener's head, measured from ear-to-ear) is 2d_e , then it follows that:

[0046] v_L = (x_L, y_L, z_L) = v - (0, d_e ,0) = (x, y - d_e, z) (3)

[0047] v_R = (-¾ , y_R , z_R ) = v + (0, d_e ,0) = (x, y + d_e , z) (4) [0048] In addition, the distance of the audio object from the midpoint between the listener's ears, and the left ear and the right ear, respectively, can be computed as follows:

d_R = + + (7)

[0049] It is therefore possible to compute normalized unit vectors: v' = {x y',z ) = {^-^A (8)

yd a a J

¾ = ( , ^, ) = (¾ . ¾ · ¾) (?)

¾^' = ( «, y«, ¾ ) = (¾ ^■¾ ^■¾ do⁾ Absolute HRTF compared with Normalized HRTFs

[0050] HRTF Impulse Responses may be modified in various ways, to suit the requirements of different applications. For example, as an audio object moves closer to the listener, an Absolute HRTF Impulse response will vary in gain and delay, emulating the real-world, wherein a closer object will be louder, and the time delay from the audio object to the listener's ears will be shorter, as a result of the finite speed of sound. It is therefore possible to define a series of terms:

[0051 ] Absolute HRTFs: A left-right pair of HRTF filters, representative of an audio object located at some position relative to the listener, that includes a delay that is representative of the time taken for sound to travel from the audio object to the listener, and a gain that is representative of the attenuation incurred as a result of the distance of the audio object from the listener. [0052] Delay-normalized HRTFs: A left-right pair of HRTF filters that do not include a delay that is representative of the time taken for sound to travel from the audio object to the listener. It is common, in the art, to normalize the delays of left-right HRTF pairs such that the first non-zero tap of either the left or right impulse response occurs close to time zero.

[0053] Gain-normalized HRTFs: A left-right pair of HRTF filters that do not include a gain that is representative of the attenuation incurred as a result of the distance of the audio object from the listener. It is common, in the art, to normalize the gains of left-right HRTF pairs such that the average gain of the left-right filters at low frequencies is approximately 1.

[0054] Normalized HRTFs: A pair of left-right HRTFs that are both Delay-normalized and Gain-normalized.

[0055] Whilst different embodiments can utilise HRTFs in different forms during processing, it is assumed that, for final convolution purposes, it is necessary to generate Absolute HRTFs. It will be appreciated that Delay-normalization or Gain-normalization is a commonly understood process in the art, and hence the novel aspects of the invention, as applied to Absolute HRTFs, will also apply to Delay-normlized and/or Gain-normalized HRTFs.

[0056] Fig. 3 shows the arrangement of the audio object 32 and the subject 31 used in the measurement of the HRTF filters for an audio object 32 located at v = (x, y, z) .

[0057] Parallax

[0058] If the audio object 32 moves closer to a listener, along a trajectory 33 that follows a straight line through the midpoint between the listeners ears, the direction-of- arrival unit- vector, ' will not change, but the Absolute HRTFs will change. The most dramatic changes in the Absolute HRTFs will be the delay and gain changes, and if these changes are removed (by normalizing the HRTFs), the resulting normalized HRTFs will still exhibit some changes as a function of distance. These changes occur for many reasons, including the following: 1. the direction of the audio object, relative to each of the listener's ears, will vary as the audio object approaches the listener, due to parallax (the change in angular position of an external object, as the viewpoint is shifted). In the embodiment, the generation of the HRTF filters take account of this parallax.

2. the sound from the audio object, as it is incident on the listener's ears, will vary from being a plane-wave (when the audio object is a large distance from the listener) to a spherical-wave (as the audio object comes closer to the listener) Again, the plane- wave/spherical-wave variation in the HRTFs, which is particularly significant for audio objects very close to the listener (less than 50cm, say), is also accounted for in an alternative embodiment through the manipulation of a residual filter which is designed to account for the near field effects.

[0059] Normalized ear-centered HRTFs

[0060] If an audio object 32 is to be located at a direction defined by the unit-vector v '_L (34), at a distance d_L from the listener's left ear, this will correspond to v_L = d_L x v_L , and hence, the position of the audio object relative to the midpoint between the listener's ears will be : v = (0, d_e ,0) + d_L xv_L. (11)

[0061 ] The transfer function from this audio object to the listener's left ear will be h_L(v , t) , as defined previously. As the distance d_L increases, h_L(v , t) will exhibit a delay that increases with distance, and a gain that varies inversely with distance. By removing these delay and gain artefacts, the embodiment creates a normalized far-field HRTF.

[0062] A new 'ear centered' filter function, i ^c(v '_L , ?) can be defined based on the far- field HRTFs, as follows:

[0063] The equation above effectively removes the distance-related gain and delay from the far-field HRTFs.

[0064] The term Normalized ear-centered HRTF is defined to refer to the filter h^^c(v'_L,t) . Note that this filter is a function of v (34), the unit vector that indicates the direction of the audio object relative to the listener's left ear.

[0065] Now, for smaller values of it is desirable to introduce a residual filter r(v_L , t) such that:

1 d,

h_L(v,t) =— h^ (v'_L ,t ^L-) ® r(v_L , t)

d, c

(13)

[0066] In the far field, this residual filter will have no effect, i.e. : lim r(v_L,t) = S(t)

r_*—inf

[0067] The filter r(v_L,t) can be a close approximation to S(t) for d^ > lm, and even for values of <¾, < 1 metre, a reasonable approximation to the HRTF filter may be made by assuming that r(v_L,t) = S(t) . Hence, we may say:

1 d,

A_L(v,i)^«— xftf ( 'L.'-— )

d, c

(15) [0068] and, for the right ear, the corresponding HRTF filter is:

1 ,£c,~. d

h_R(v,t)^—xh^ (v'_R,t—

d_R c ₍₁₆₎

[0069] Equation 15 is effectively saying that the HRTF, h_L(v,t) , may be approximated by applying a gain and a time-shift to the Normalized ear-centered HRTF (and likewise, for the right ear, as per Equation 16). [0070] An implementation of a system implementing the method of Equation 15 for both left and right ears is illustrated 40 in Fig. 4. The method utilizes the fact that the Normalized ear-centered HRTF, h^EC(v '_L , t) , is made simpler to generate because it is a function of the unit- vector v '_L , but not a function of the radial distance d_L . [0071 ] Taking the symmetrical case of the left Absolute HRTF, given the input 41 is v in head-centered coordinates, the first step is to add the offset 42 for conversion to left ear coordinates v_L . Subsequently, from v_L , v '_L and d_L are calculated 43. From these, h^EC (v '_L , t) , del^EC and gain_L ^Ec can be formed 44. The conversion of the Normalized ear- centered HRTF h^EC(v '_L , t) to the final HRTF is done by: adding 45 delay del_L ^Ec and scaling 46 by gain gain_L ^Ec . Similar operations can be carried out for the symmetrical right ear case.

Generating Normalized HRTF filters

[0072] In order to generate the normalized HRTFs, del^EC ,

gain^EC and gain^EC need to be computed by different means as illustrated 50 in Fig. 5. This processing takes place inside the box labelled "Distance Normalization" 51 in Fig. 5. For example, the delays may be computed according to two primary methods:

[0073] Without Delay Normalization:

EC

del_L

c (17) del i EC d R

R

c (18)

[0074] With Delay Normalization: d_N = min(d_L, d_R)

(19) _dd Ec ₌ ^_IL

(20) d_d Ec ₌ d _d _L

(21)

[0075] Likewise, the gain calculations can be computed according two primary methods:

[0076] Without Gain Normalization:

• EC

gain_L

d, (22) gain_l

(23)

[0077] With Gain Normalization: d_N = min(d, , d„)

(24) d

gain_L

(25)

• Ec d_N

gain_R =—

d_R

(26)

[0078] In both cases (delay-normalization and gain-normalization), the normalization is performed by taking into account d_N , the distance of the audio object to the nearest ear of the listener. Alternative embodiments may make use of other normalization variables, including d (the distance of the audio object from the midpoint between the listener's ears). Equations 20 and 21 imply that one HRTF (the near ear) will have zero delay added, whilst the other ear will have the relative delay added, ensuring that the correct inter-aural delay is present in the final HRTF pair. Furthermore, Equations 25 and 26 imply that one HRTF (the near ear) will have unity gain applied, whilst the other ear will have a gain less than unity applied, ensuring that the correct inter-aural gain is present in the final HRTF pair. The ear-centered HRTF polynomial method

[0079] Fig.5 includes the blocks referred to as EC_L 52 and EC_R . The EC_L block 52, for example, is responsible converting from v'_L to h^EC(v'_L ,?) :

EC_L: v'_L→h_L ^Ec(v'_L,t) _(2?)

[0080] A polynomial method for the generation of the ear-centered HRTFs can operate as follows (for the symmetrical case of left-ear HRTF):

[0081] 1. Input is in the form of a unit vector: v'_L = (x_L',y_L',z_L') [0082] 2. Compute N polynomial functions, p_n(x, y, z) : p_n(x,y,z) =∑a_n. _kxⁱyⁱz^k n = \,2,...,N

i ,k (28)

[0083] 3. Sum together N pre-defined prototype filter responses, weighted by the values of the N polynomials evaluated at (x_L' , y_L' , z_L') , to produce the un-delayed ear- centered HRTF:

N

f%(v'_L,t) =∑p_n(x_L,y_L',z_L)proto_Ln(t)

(29)

[0084] 4. Evaluate the weighted sum of the N polynomials evaluated at (x_L', y_L', z_L') , to produce a head-shadow-delay: del_L ^u ( '_L ) = jP_n (XL' ,y_L',z_L )u_Ln

(30)

[0085] 5. Produce a ear-centered HRTF, by adding a delay to the un-delayed centered HRTF:

[0086] In a similar fashion, it is possible to compute the right-ear-centered HRTF (commonly, it is possible to use the same N polynomial functions as used for the left- HRTF process): (V'R . t) = jp_n {XR' ,y_R',z_R )proto_{R n} {t)

(32) del_R ^u{v ) = p_n(.x_L,y_L,z_L)u_Rtn

(33) h_R(v'_R,t) = h_R ^u(v'_R,t-del_R ^u(v'_R)) (34)

[0087] Commonly, the choice of the N polynomials will be made to ensure that they are simple and easy to evaluate. For example, the following set of N = 13 filters has been found to provide a good basis for producing reasonable approximations to ear-centered HRTF filters:

_Pl(x,y,z) = l ₍₃₅₎

p₂(x,y,z) = x (₃₆)

p₃(x,y,z) = y (₃₇) p₄(x,y,z) = z (38) p₅(x, y,z) = x

(39)

p₆(x,y,z) = xz

(40)

p₇(x, y,z) = xy

(41)

p_s(x, y, z) = yz

(42)

p₉(x, y,z) = y²

(43)

p_w(x,y,z) = y²z

(44)

p_u(x, y, z) = xy²

(45)

p₁₂(x,y,z) = y³ (46)

P₁₃(x,y,z) = y⁴

(47)

[0088] Once the N polynomials have been pre-defined, the ear-centered HRTFs are then specified by the following information:

[0089] -The N left-ear prototype filters, p _n{t)

[0090] - The N left-ear head-shadow delay weights, u_{L n}

[0091] -The N right-ear prototype filters, p_Rn(t)

[0092] - The N right-ear head-shadow delay weights, u_{R n}

[0093] If the HRTFs are left-right symmetric (so that h_L((x,y,z),t) = h_R((x,-y, z),t)), then it is possible to compute the right ear-centered HRTFs using the left-ear prototype filters and the left-ear head-shadow delay weights, as follows:

N

∑A. (¾ -y_R' > ZR )p^rot°L,„ (0

n=l (48)

h (v '_R , t) h_R ^u (v '_R , t - del_R ^u (v '_R )) (50)

Using impulse responses in a time-sampled system

[0094] For practical reasons, the embodiment operates on audio signals as time-sampled digital signals. In this case, the impulse responses will also be time-sampled. The examples given previously are described in terms of continuous-time impulse responses, but it will be appreciated by those skilled in the art that equivalent discrete-time impulse responses may be used in places where the continuous -time functions are described here.

[0095] Some added attention is required in the implementation of time-delay operations, when the time delays are not an integer number of sample-periods. Once again, those skilled in the art will be familiar with methods by which this may be done, including the following possibilities:

[0096] · All filter responses may be manipulated in the frequency domain, and hence the addition of an arbitrary time-delay may be implemented by applying a frequency- dependant phase shift to the frequency domain filter samples

[0097] · The filter responses may be stored as higher-sample-rate impulse responses, so that time-shifts may be implemented with sub-sample accuracy (with the final HRTF filters being decimated after they have been generated by the algorithms described herein).

Implementation of the HRTFs as filterbanks [0098] In some applications, a large number of audio objects are to be rendered for playback to a listener, with each audio object processed with a pair of left-right HRTFs corresponding to a different location. Assuming M objects are to be rendered in this way, according to the method described above, the M objects will be rendered as follows:

[0099] 1. For each object, m = 1,2,... , M :

[00100] (a) Compute the left ear HRTF for the location of object m . This process will include the evaluation of the N polynomials, and the weighted summing of the N proto_{L n}(t) responses.

[00101 ] (b) Compute the right ear HRTF for the location of object m . This process will include the evaluation of the N polynomials, and the weighted summing of the N proto_{R n} {t) responses.

[00102] (c) Filter the audio signal for object m by the left and right HRTFs, to produce the left and right intermediate signals, respectively, for object m .

[00103] 2. Sum together the left intermediate signals for the M objects to produce the left output signal.

[00104] 3. Sum together the right intermediate signals for the M objects to produce the right output signal.

[00105] The process outlined above will required 2M filters to be applied to audio signals, and this filtering process may require excessive compute power. If N < M , it is possible to simplify the filtering requirements by modifying the procedure as follows:

[00106] 1. For each object, m = 1,2,... , M :

[00107] (a) Compute the total delay that will be applied to the left-ear HRTFs: del = del" +

. [00108] (b) Process the audio signal for object m by applying the delay, del_tmp , and gain, gain^^E , to produce the left-ear-delayed signal for object m

[00109] (c) Evaluate the N polynomials, p_n(x_L' , y_L' , z_L' ) to produce N scale- factors, and scale the left-ear-delayed signal for object m by each of these N scale factors to produce N left-ear-delayed-scaled signals for object m

[001 10] (d) Compute the total delay that will be applied to the right-ear HRTFs: del_tmp ⁼ def_R + del_R ^c , and the gain gain_R ^c .

[001 1 1 ] (e) Process the audio signal for object m by applying the delay, del_tmp , and gain, gain_R ^c , to produce the right-ear-delayed signal for object m [001 12] (f) Evaluate the N polynomials, p_n (x_R' , y_R' , z_R' ) to produce N scale-factors, and scale the right-ear-delayed signal for object m by each of these N scale factors to produce N right-ear-delayed-scaled signals for object m

[001 13] 2. Produce N combined left-ear-delayed-scaled signals, such that each of the N combined left-ear-delayed-scaled signals is formed by summing together the corresponding left-ear-delayed-scaled signals for the M objects

[001 14] 3. Produce N combined right-ear-delayed-scaled signals, such that each of the N combined right-ear-delayed-scaled signals is formed by summing together the corresponding right-ear-delayed-scaled signals for the M objects

[001 15] 4. Filter the N combined left-ear-delayed-scaled signals through the corresponding N prototype filters proto_{L n}{t) and sum the outputs of these N filters to produce the left output signal

[001 16] 5. Filter the N combined right-ear-delayed-scaled signals through the corresponding N prototype filters proto_{R n}(t) and sum the outputs of these N filters to produce the right output signal

[001 1 7] This process uses 2N filters, so as the number of objects M grows the filtering processing will remain constant.

[001 1 8] Both of these processes outlined above can be implemented by a common front- end process, as shown in Fig. 6. The back-end processing will be different for each of the methods, as shown in Figs. 7 and 8.

Implementation

[001 1 9] Once the left and right HRTFs have been calculated for a particular audio source location, they can be applied in a standard manner to a corresponding audio source input to produce an audio output for playback to a user. For example, Fig. 9 shows one such arrangment 90, where audio source 91 for a particular location, is duplicated for left and right channels. Taking the symmetrical case of the left channel, it is loaded into an FIR filter 92 where it is convolved with the corresponding HRTF 93 calculated as

aforementioned. The convolved output forms one spatialized audio source output which is summed 94 with other outputs to produce an overall left speaker output 95.

[00120] The corresponding right output 96 is also calculated in an analagous manner.

[00121 ] Depending on the computational resources available, the arrangement of Fig. 9 can be implemented in a real time or batch manner. When implemented in a real time manner, the playback can be to a series of headphone transducers a user to listen to spatialised audio. In a batch manner, the audio output 95, 96 can be stored for subsequent playback to a user at a later time, with the playback requiring less onerous computational resources.

[00122] In overall summary, it can be seen that the embodiments provide a means for computing the HRTF impulse responses defined in Equations 1 and 2, for a given audio object location v = (x, y, z)■ Furthermore, the embodiments provide the following benefits: The calculation of the impulse responses uses a small amount of pre-stored data; The calculation makes use of rapid arithmetic operations; The resulting filters can be implemented in a manner that provides efficiency in the audio processing.

[00123] In further summary, it can be seen that the embodiments provide for:

[00124] 1. The formulation of the left and right ear HRTF impulse responses for a sound emitter at a specified location, v , by applying the following steps: [00125] (a) determining the left-ear-relative sound location, v_L = v - (0, d_e,0) (the location of the sound emitter relative to the left ear of the virtual listener);

[00126] (b) determining the right-ear-relative sound location, v_R = v + (0, d_e ,0) (the location of the sound emitter relative to the right ear of the virtual listener);

[00127] (c) determining the left-ear-relative direction of arrival, v (a unit-vector indicating the direction of the sound emitter relative to the left ear) as well as the left-ear- distance d_L , being the Euclidean norm of the left-ear-relative sound location,

[00128] (d) determining the right-ear-relative direction of arrival, v '_R (a unit-vector indicating the direction of the sound emitter relative to the right ear) as well as the right- ear-distance, d_R , being the Euclidean norm of the right-ear-relative sound location,

[00129] (e) determining the left-ear-centered HRTF for the object, _L ^{E c} (v , t) , based on it's left-ear-relative direction of arrival, v '_L ;

[00130] (f) determining the right-ear-centered HRTF for the object, h (v '_R , t) , based on it's right-ear-relative direction of arrival, v '_R ;.

[00131 ] (g) forming the left HRTF from the left-ear-centered HRTF, including adding a delay to the left Ear centered HRTF derived from the left-ear-distance: delay _L =— , and also including a gain applied to the left ear HRTF according to the left- distance: gain_L

[00132] (h) forming the right HRTF from the right-ear-centered HRTF, including adding a delay to the right Ear centered HRTF derived from the right-ear-distance: delay _R =— , and also including a gain applied to the right ear HRTF according to the c right-ear distance:„ _{ι =} .

d„

[00133] 2. The formulation of an ear-centered HRTF impulse response (for example, the left-ear-centered HRTF impulse response) for a sound emitter at a specified location relative to the left ear, v_L , by applying the following steps:

[00134] (a) determining the left-ear-relative direction of arrival of the sound emitter, v '_L = (x_L' , y_L' , z_L' ) , being a unit-vector pointing in the direction of the sound emitter, relative to left the ear of the virtual listener.

[00135] (b) determining the un-delayed ear-centered HRTF impulse response, h_L ^u (v , i) , as a polynomial function of the x_L' , y_L' , z_L' unit- vector coordinates.

[00136] (c) determining the head-shadow-delay, del" (v'_L ) , as a polynomial function of the x_L' , y_L' , z_L unit-vector coordinates.

[00137] (d) forming the ear-centered HRTF impulse response from the un-delayed ear-centered HRTF impulse response by the addition of the ear-shadow-delay:

h_L(v '_L , n) = h_L ^u (v '_L , t - del_L ^u (v '_L )) .

Interpretation

[00138] Reference throughout this specification to "one embodiment", "some embodiments" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, appearances of the phrases "in one embodiment", "in some embodiments" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment, but may. Furthermore, the particular features, structures or characteristics may be combined in any suitable manner, as would be apparent to one of ordinary skill in the art from this disclosure, in one or more embodiments.

[00139] As used herein, unless otherwise specified the use of the ordinal adjectives "first" , "second" , "third" , etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.

[00140] In the claims below and the description herein, any one of the terms comprising, comprised of or which comprises is an open term that means including at least the elements/features that follow, but not excluding others. Thus, the term comprising, when used in the claims, should not be interpreted as being limitative to the means or elements or steps listed thereafter. For example, the scope of the expression a device comprising A and B should not be limited to devices consisting only of elements A and B. Any one of the terms including or which includes or that includes as used herein is also an open term that also means including at least the elements/features that follow the term, but not excluding others. Thus, including is synonymous with and means comprising.

[00141 ] As used herein, the term "exemplary" is used in the sense of providing examples, as opposed to indicating quality. That is, an "exemplary embodiment" is an embodiment provided as an example, as opposed to necessarily being an embodiment of exemplary quality.

[00142] It should be appreciated that in the above description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, FIG., or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

[00143] Furthermore, while some embodiments described herein include some but not other features included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention, and form different embodiments, as would be understood by those skilled in the art. For example, in the following claims, any of the claimed embodiments can be used in any combination.

[00144] Furthermore, some of the embodiments are described herein as a method or combination of elements of a method that can be implemented by a processor of a computer system or by other means of carrying out the function. Thus, a processor with the necessary instructions for carrying out such a method or element of a method forms a means for carrying out the method or element of a method. Furthermore, an element described herein of an apparatus embodiment is an example of a means for carrying out the function performed by the element for the purpose of carrying out the invention.

[00145] In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

[00146] Similarly, it is to be noticed that the term coupled, when used in the claims, should not be interpreted as being limited to direct connections only. The terms "coupled" and "connected," along with their derivatives, may be used. It should be understood that these terms are not intended as synonyms for each other. Thus, the scope of the expression a device A coupled to a device B should not be limited to devices or systems wherein an output of device A is directly connected to an input of device B. It means that there exists a path between an output of A and an input of B which may be a path including other devices or means. "Coupled" may mean that two or more elements are either in direct physical or electrical contact, or that two or more elements are not in direct contact with each other but yet still co-operate or interact with each other.

[00147] Thus, while there has been described exemplary embodiments of the invention, those skilled in the art will recognize that other and further modifications may be made thereto without departing from the spirit of the invention, and it is intended to claim all such changes and modifications as falling within the scope of the invention. For example, any formulas given above are merely representative of procedures that may be used. Functionality may be added or deleted from the block diagrams and operations may be interchanged among functional blocks. Steps may be added or deleted to methods described within the scope of the present invention.

Claims

CLAIMS:

1. A method of creating a series of head related transfer functions for the playback of audio signals, the method including the steps of:

(a) for at least one intended listener' s ear of playback, and for at least one externally positioned audio source, formulating at least one normalized ear centered HRTF having substantially invariant characteristics along a radial line from the listener's ear; and

(b) modifying the normalized ear centered HRTF by a delay factor and an attenuation factor in accordance with a distance measure from at least one of the listener' s ears.

2. A method of spatializing an audio input signal so that it has an apparent external position when played back over headphone transducers, the method including the steps of:

(a) initially forming a normalised HRTF for an audio input signal located at an external position relative to a listener, the HRTF being substantially invariant along a radial line from the listener' s ear;

(b) further modifying the normalised HRTF by a delay and attenuation factor in accordance with the distance of the audio source from the listener to produce a resulting ear centered HRTF; and

(c) utilising the ear centered HRTF to filter the audio input signal to produce an output stream which approximates the effect of projection of the audio to an ear of the listener.

3. A method as claimed in claim 2 wherein said step (c) includes convolution of the ear centered HRTF with the audio signal.

4. A method of spatializing an input audio stream to produce a spatialized audio output stream for playback over audio transducers placed near a listener's ears, the method including the steps of: (a) forming a left ear centered intermediate HRTF having substantially invariant characteristics along a radial line centered at an intended listener' s left ear;

(b) delaying and attenuating the left ear centered intermediate HRTF in accordance with an intended distance measure of the input audio stream from a listener's ear to produce a left scaled HRTF;

(c) combining the left scaled HRTF with the input audio stream to produce a left audio output stream signal;

(d) forming a right ear centered intermediate HRTF having substantially invariant characteristics along a radial line centered at an intended listener' s right ear;

(e) delaying and attenuating the right ear centered intermediate HRTF in accordance with an intended distance measure of the input audio stream from a listener' s ear to produce a right scaled HRTF;

(f) combining the right scaled HRTF with the input audio stream to produce a right audio output stream signal; and

(g) outputting the left and right audio output stream signals as a spatialized audio output stream.

5. A method as claimed in claim 4 wherein said steps (c) and (f) of combining comprises convolving the corresponding HRTF with the input audio stream.

6. A method of creating at least a first HRTF impulse response for a sound emitter at a specified location, for at least a first ear of a virtual listener, the method including the steps of:

(a) determining the location of the sound emitter relative to the first ear of the virtual listener;

(b) determining a first ear relative direction of arrival, and a first ear relative distance of the sound emitter;

(c) determining a first ear centered HRTF for the sound emitter, based on the first ear relative direction of arrival; and

(d) forming the first HRTF impulse response from the first ear centered HRTF, including adding a delay to the first ear centered HRTF derived from a first ear relative distance and also including a gain applied to the first ear centered HRTF according to the first ear relative distance.

7. A method as claimed in claim 6 wherein the delay and gain are calculated by including a first and second ear relative distance.

8. A method of formulating a first ear centered HRTF impulse response for a sound emitter at a predetermined location relative to a first ear, the method including the steps of:

(a) determining a first ear relative direction of arrival of the sound emitter, relative to the first ear of the virtual listener;

(b) determining an undelayed ear centered HRTF impulse response, as a parameterised function of the first ear relative direction of arrival;

(c) determining a head-shadow-delay, as a parameterised function of the first ear relative direction of arrival;

(d) forming the first ear centered HRTF impulse response from the undelayed ear centered HRTF impulse response by the addition of the head-shadow-delay.

9. A method as claimed in any previous claim wherein said method is applied substantially symmetrically for a first and second ear of a listener.

10. A method of spatializing an audio input signal so that it has an apparent external position when played back over headphone transducers, the method including the steps of: (a) forming a series of prototype normalised HRTFs for an audio input signal located at a series of external positions relative to a listener, the prototype normalised HRTFs being substantially invariant along a radial line from the listener's ear;

(b) utilising a series of interpolation functions for interpolating between said series of prototype normalised HRTFs in accordance with an apparent external position relative to the listener, so as to form an undelayed ear centered HRTF;

(c) calculating a delay and gain factor from the radial distance to the apparent external position, and applying the delay and gain factor to the undelayed ear centered HRTF to produce an ear centered HRTF;

(d) utilising the ear centered HRTF to filter the audio input signal to produce an output stream which approximates the effect of projection of the audio to an ear of a listener.

11. A method as claimed in claim 10 wherein said series of interpolation functions comprise a series of polynomials.

12. A method as claimed in claim 11 wherein said series of polynomials are defined in terms of a Cartesian coordinate system centered around the head of a listener.

13. A method as claimed in claim 11 wherein said series of polynomials are defined in terms of a Cartesian coordinate system centered around an ear of a listener.

14. A method as claimed in claim 11 wherein said method is utilized to form both a left and right channel signal for a listener and the same series of prototype normalised HRTFs are used for each ear of the listener.

15. A method as claimed in any one of claims 10 - 14 wherein said prototype normalised HRTFs are stored utilising a high sample rate and said utilising step (d) includes subsampling the prototype normalised HRTFs to filter them with the audio signal.

16. A method of spatializing a series (M) of audio input signal objects each having an apparent external position so that the signals maintain an apparent external position when played back over headphone transducers, the method including the steps of:

(a) for each of the M audio input signal objects:

(i) Computing a total delay and gain to be applied to a left-ear HRTF;

(ii) Applying the delay and gain to the audio input signal object to produce a first ear delayed signal for the object;

(iii) Interpolating a series of polynomials to produce a series (N) of scale factors and scaling the first ear delayed signal for the object, to produce N first ear delayed scaled signals for the object;

(b) producing N combined first-ear-delayed-scaled signals, such that each of the N combined first-ear-delayed-scaled signals is formed by summing together the

corresponding first-ear-delayed-scaled signals for the M objects;

(c) filtering the N combined first-ear-delayed-scaled signals through a corresponding series of N prototype filters and summing the outputs of these N filters to produce a first output signal for playback to a first ear.

17. An apparatus adapted to perform the method of any one of claims 1 to 16.

18. A computer-readable storage medium storing a program of instructions that is executable by a device to perform the method of any one of claims 1 to 16.