EP1900251A2

EP1900251A2 - Audio processor for narrow-spaced loudspeaker reproduction

Info

Publication number: EP1900251A2
Application number: EP06722926A
Authority: EP
Inventors: Jan Abildgaard Pedersen
Original assignee: AM3D AS
Current assignee: AM3D AS
Priority date: 2005-06-10
Filing date: 2006-04-28
Publication date: 2008-03-19
Also published as: WO2006076926A2; WO2006076926A3

Abstract

Audio processor for processing a set of input audio channels and generate a corresponding processed set of signals adapted for playback via a set of narrow-spaced loudspeakers with the purpose of providing a spatial image widening effect. The audio processor includes a cross talk canceller active only in a pre-selected frequency range, e.g. 1.5-18 kHz, and substantially in-active outside this frequency range. In addition, the audio processor includes applying substantially similar frequency weightings to the two input audio channels within the mentioned pre-selected frequency range. This frequency weighting is selected such that the processed set of signals provides a listener with a perceived timbre being substantially the same as a perceived timbre provided by the input set of audio signals. The frequency weighting is preferably based on a magnitude of an ipsi-lateral or a contra-lateral transfer function, or based on a square root of sum of squares of magnitudes of ipsi-lateral and contral-lateral transfer functions. The audio processor is advantageous since it provides a high sound quality without severe tonal coloration and with a stable spatial widening effect tolerant to listener head movements in spite of very narrow-spaced loudspeakers, such as with a listening angle of 4° or less, e.g. in a mobile phone or other handheld devices. In addition, the processor is advantageous in that it provides a high reproduction quality of both timbre and spatial aspects for normal stereo signals as well as binaural signals, including 3D spatial content in case of binaural input signals, without the need to adapt the processing to the actual input signal type.

Description

AUDIO PROCESSOR FOR NARROW-SPACED LOUDSPEAKER REPRODUCTION

Field of the invention

The invention relates to the field of audio, more specifically the invention relates to the field of cross talk cancellation, i.e. cancellation of acoustical cross talk introduced during sound reproduction via two loudspeakers, because sound from both loudspeakers will reach both left and right ears of a listener. The invention provides an audio processor for enhancing spatial image of a conventional stereo signal or a binaural signal when reproduced by a set of narrow-spaced loudspeakers and a mobile unit and a loudspeaker unit comprising such audio processor. In addition, the invention provides a method of processing an audio signal.

Background of the invention

Cancelling of cross talk in a static listening situation by means of an electrical manipulation of input signals prior to being applied to loudspeakers is well-known in the art. For reproduction of a binaural signal, e.g. sound recorded with an artificial head or synthesized using so-called head related transfer functions or approximations thereof, using two loudspeakers, cross talk cancellation is crucial. The reason is that binaural sound reproduction is based on a precise reproduction of sound pressures at a listener's two ears, and the cross talk introduced in the loudspeaker listening situation will more or less destroy the advantage of the binaural signal.

Cross talk cancellation systems or processors have also been used with normal stereo signals intended for reproduction of a pair of loudspeakers symmetrically positioned in front of a listener and at a distance providing a listening angle of the order of 60° (also referred to as Blumlein stereo). When such normal stereo signals are reproduced via narrow-spaced loudspeakers, such as in a portable radio etc., i.e. with a listening angle of less than e.g. 20°, a poor and narrow spatial image will result. A cross talk canceller and synthesized virtual loudspeaker positions using head related transfer functions can be used to enhance a perceived spatial image in such situations.

Proper cancellation of cross talk requires knowledge of sound transmission from each loudspeaker to both ears of a listener for the relevant listening angle, i.e. the head-related transfer function. A perfect cross talk cancellation is possible in an anechoic environment with a listener in a fixed position. US 4,975,954, by Cooper and Bauck, describes a cross talk cancellation system for playback of audio signals having head related transfer functions imposed thereon. It is proposed to apply an equalization function during recording of such audio signals, e.g. artificial head recordings. The proposed equalizing function is based on ipsi-lateral and contra -lateral transfer function characteristics for the loudspeaker positions in the listening setup. The playback system, comprising a cross talk canceller, is then adapted to apply the inverse of the equalizing function, and thus an effective equalizing is neutral. In a second cross talk cancellation system described in US 4,975,954, head related transfer functions are imposed on the input signal that are intended to synthesize positions of virtual loudspeakers in a normal listening setup, e.g. a 60° listening setup. In this second system a frequency weighting is applied that simulated the reciprocal of a transfer function whose magnitude is proportional to the square root of the sum of squares of the magnitudes of acoustic ispi-lateral and contra-lateral transfer functions related to the listening setup.

US 6,760,447, by Nelson, Kirkeby and Hamada, describes a cross talk cancellation system intended for reproduction via a set of narrow-spaced loudspeakers, i.e. loudspeakers providing a listening angle of 6°-20°. In order to provide a wide spatial sound image, head related transfer functions are applied to the input signal so as to synthesize positions of virtual loudspeakers in a normal listening setup, i.e. with a listening angle substantially wider than the listening angle provided by the actual loudspeakers.

EP 1 225 789, by Kirkeby, describes a cross talk cancellation system that is intended for use with narrow-spaced loudspeakers such as in mobile appliances and being claimed not to introduce substantial unintended coloration. The cross talk cancellation part of the system is only active for a low frequency part of an input signal, below 2 kHz, whereas no cross talk cancellation is applied above 2 kHz.

In a practical listening situation with acoustic reflections and listener movements, a highly reduced effect of cross talk cancellation is the result with known cross talk cancellation systems. In addition, known cross talk cancellation systems suffer from an unnatural coloration or timbre. Furthermore, prior art systems are designed so that different settings are required in order for these systems to be able to properly reproduce binaural signals and normal stereo signals. Thus, the processing must be changed in accordance with the actual type of input signal - e.g. this could be obtained by the user selecting the processing according to the actual signal he intends to play, if a proper result should be achieved. Thus, a wrong selection of processing will result in a poor sound quality. Summary of the invention

According to the above, it may be seen as an object of the present invention to provide an audio processor that is capable of processing a set of input signals so as to provide a listener with an image widening effect by reproduction via a set of narrow-spaced loudspeakers, also for very narrow-spaced loudspeakers such as in a mobile phone.

Furthermore, the audio processor should provide a natural perceived timbre, i.e. it should not introduce unintended perceived coloration of the set of input signals.

A first aspect of the invention provides an audio processor suited for processing an input set of first and second audio signals and generate a corresponding processed set of first and second audio signals adapted for playback via a set of respective first and second narrow-spaced loudspeakers to provide a listener with left and right ear signals, the audio processor comprising

- a cross talk canceller adapted to at least reduce a resulting cross talk from the input set of first and second audio signals to respective left and right ear signals, wherein the cross talk canceller is active within a pre-selected frequency range and substantially in-active outside this frequency range, and

- frequency weighting means adapted to apply a set of first and second frequency weightings are applied to respective first and second input audio signals within the preselected frequency range, said first and second frequency weightings being substantially similar, and said first and second frequency weightings being selected so that the processed set of audio signals provides a listener with a perceived timbre being substantially the same as a perceived timbre provided by the input set of audio signals.

By 'narrow set of loudspeakers' is a setup comprising two loudspeakers providing a listening angle of less than 20°, thus also covering a scenario with very narrow-spaced loudspeakers such as providing a listening angle of l°-4°, e.g. by listening to a stereo mobile phone at a distance of 40 cm. See Fig. 1 for a definition of "listening angle' θ.

By "cross talk canceller' is understood a signal manipulator that is intended to manipulate first and second input signals with the purpose of generating manipulated first and second signals that include a compensation for acoustical cross talk introduced in a loudspeaker listening setup - i.e. sound from a left loudspeaker reaching right ear of a listener, and sound from a right loudspeaker reaching left ear of the listener. The purpose of the cross talk canceller is to completely compensate for the acoustical cross talk, i.e. effectively cancel the cross talk. In other words, the goal of the cross talk canceller is that the first input signal is transferred exclusively to left ear of the listener and the second input signal is transferred exclusively to right ear of the listener.

In preferred embodiment, the cross talk canceller is implemented by assuming a symmetrical listening setup. By a perfectly symmetrical listening setup is understood a listening setup where sound transmission from left loudspeaker to left ear and sound transmission from right loudspeaker to right ear are identical, in other words left and right ipsi-lateral transfer functions are identical. Correspondingly, sound transmission from left loudspeaker to right ear and sound transmission from right loudspeaker to left ear are identical, in other words left-right and right-left contra-lateral transfer functions are identical. Thus, preferred embodiments of the cross talk canceller involves processing the first and second input audio signals with the purpose of compensating the acoustic contralateral transfer function.

By 'frequency range' is understood a range from a lower to a higher cut-off frequency, wherein cut-off frequencies are defined e.g. as -3 dB points such as commonly known in the art.

It is to be understood that the frequency weighting means may be adapted to either apply the first and second frequency weightings to the set of input audio signals before or after the set of input audio signal have been processed by the cross talk canceller. Alternatively, the audio processor is implemented so that the first and second frequency weightings are applied as an integrated part of the cross talk canceller.

It is understood that for some applications, the frequency weighting means may be complete omitted. Especially the frequency weighting means may be omitted in case frequency weightings at other parts or devices external to the audio processor provides a frequency weighting such that the mentioned perceived neutral timbre is obtained. E.g. such external frequency weighting may be a frequency weighting applied due to the non- flat frequency response of the associated narrow-spaced loudspeakers. Another example where the frequency weighting means may be omitted is where the input audio signals are pre-processed or synthesized with a suitable frequency weighting prior to being applied to the audio processor.

By 'said first and second frequency weightings being selected so that the processed set of audio signals provides a listener with a perceived timbre being substantially the same as a perceived timbre provided by the input set of audio signals' is understood that the first and second frequency weightings are selected so that the audio processor does not add any significant perceived spectral coloration to the set of input audio signals. Hereby is understood that if a timbre listening experiment was carried out with a panel of normal hearing listeners, a version of a reference audio signal processed with an audio processor according to the invention would be recognized as closer to the reference audio signal than a version processed with any prior art audio processor.

E.g. such listening experiment could be carried out as a blind, balanced 2-AFC (two Alternative Forced Choice) experiment, in which the listeners are presented with the reference audio signal and the listeners are asked to respond which of two alternatives provides a perceived timbre closest to a perceived timbre of the reference audio signal. One alternative is the reference signal processed by the audio processor according to the invention, and a second alternative may be any prior art audio processor. It will then be understood that the statement of the first aspect with respect to the first and second frequency weightings is fulfilled if the result of the described 2-AFC listening experiment is that the reference signal processed by the audio processor according to the invention is selected as the one closest to the reference audio signal by 75% or more of the responses. It is to be understood, that the result should be averaged across a number of different wide band reference audio signals, and all audible presentations in the listening experiment (i.e. both the reference audio signal and the two processed versions thereof) should be presented via narrow-spaced loudspeakers following the above definition.

The audio processor according to the first aspect provides significant advantages with respect to achieve a perceived wide spatial image via narrow-spaced loudspeakers, i.e. narrow-spaced loudspeakers such as in e.g. mobile phones an other portable devices. The audio processor is capable of highly improving a perceived spatial image while still preserving a natural timbre, i.e. without a coloration that is a problem with prior art systems. Since the spatial widening effect is obtained without applying direction specific head related transfer functions to the set of input signals with the intention of providing virtual loudspeaker positions, the audio processor can be used for reproduction of binaural signals as well as stereo signals. I.e. the audio processor can be used for both types of signals without the need for any alterations or different settings or requirements that the set of input audio signals to be specially equalized. This is in contrast to prior art systems that are either suited for optimum performance with binaural signals or for optimum performance with stereo signals.

Preferred embodiments of the audio processor are capable of providing a spatial image enhancement also by reproduction via very narrow-spaced loudspeakers such as at a listening angle as low as 3°. Optimal performance is obtained with a symmetrical listening setup. However, this spatial image enhancement effect is achieved with the listener positioned within a considerably large space in front of the loudspeakers, i.e. the audio processor provides a spatial image reproduction which is generally tolerant to head movements. By reproduction of stereo signals the audio processor is capable of providing a significant spatial image widening effect compared to reproduction without the audio processor and the spatial image is still without "holes" and has a clear indication of a centre position. By reproduction of binaural signals, the audio processor provides a 3 dimensional sound reproduction with both horizontal and vertical aspects present. Also a clear frontal localization is reproduced.

By introducing a pre-selected frequency range where the cross talk cancellation unit is active, it is possible to avoid or at least limit a number of problems by design of the filters involved in a typical implementation of a cross talk canceller. The pre-selected frequency range is preferably chosen such that the cross talk cancellation unit is active in the frequency range where a perceived image widening effect is most properly achieved, i.e. in a frequency range where the human auditory system is most sensitive to spatial attributes of an auditory event. Preferably, the pre-selected frequency range has lower and upper cut-off frequencies in the ranges 0,5-3 kHz and 10-20 kHz, respectively. More preferably, the lower and upper cut-off frequencies are in the ranges 1,0-2,0 kHz and 16-20 kHz, respectively. A preferred choice is 1,5 kHz and 18 kHz respectively.

According to preferred embodiments, it is recognized that in practical listening situations acoustic reflections from various obstacles and listener movements (e.g. head movements) will destroy a perfect or ideal cross talk cancellation. This non-ideal cross talk cancellation implies that in practice a spatial sound image will be perceived as more "centralized" than would be the case if perfect cross talk cancellation had been achieved, since in practice it can not be obtained that a virtual sound source (or phantom sound source) can positioned close to one ear of the listener only. With this recognition, a design goal is to avoid adding head related transfer functions with the purpose of providing virtual loudspeaker positions. The reason is that such head related transfer functions actually further increases cross talk and thus instead of widening a perceived spatial sound image, such head related transfer functions tend to narrowing the perceived spatial sound image in most practical listening situations. Therefore, it is preferred that the design goal is headphone listening rather than loudspeaker listening.

This is approach is in contrast to the teachings of the prior art where reproduction of stereo signals is based on applying head related transfer functions for directions indicative of virtual loudspeakers positions. A further advantage of not trying to simulate virtual loudspeaker positions is that the reproduction becomes generally tolerant to head movements. Finally, it becomes possible to reproduce binaural signals with the same audio processor as used for stereo signals, since including a virtual loudspeaker destroys the possibility of obtaining the advantages of a binaural signal.

In preferred embodiments, the cross talk canceller is based on ipsi-lateral and lateral transfer functions that are determined for a specific listening setup, i.e. listening angle as defined above. These ipsi-lateral and contra-lateral transfer functions may be measured in the actual listening setup or retrieved from a data base of head related transfer functions for angles corresponding to those of the listening setup.

In a preferred embodiment, the first and second frequency weightings approximately simulates a square root of sum of squares of magnitudes of ipsi-lateral and contra-lateral transfer functions. The first and second frequency weightings may be based on a magnitude of an ipsi-lateral transfer function. The first and second frequency weightings may approximately simulate the magnitude of the ipsi-lateral transfer function combined with a gain factor. Still alternatively, the first and second frequency weightings are based on a magnitude of a contra-lateral transfer function. The first and second frequency weightings may approximately simulate the magnitude of the contra-lateral transfer function combined with a gain factor.

The pre-selected frequency range where the cross talk canceller is active may be obtained by a band pass filter. In a preferred embodiment, such band pass filter is positioned in a cross feed signal path of the cross talk canceller, i.e. in a signal path where the first and second input audio signals are combined. Alternatively, the band pass filter may be adapted to band pass the first and second input signals prior to being applied to the cross talk canceller. The band pass filter may be implemented as a separate filter or as part of a combined filter serving further purposes. The band pass filter may more alternatively be implemented as separate low pass and high pass filters that are positioned at different positions in a signal path of the audio processor.

The audio processor may be adapted for use with a set of narrow-spaced loudspeakers positioned relative to a listener to provide a listening angle of less than 20°. The listening angle may be less than 18°, such as less than 16°, such as less than 14°, such as less than 12°, such as less than 10°, such as less than 8°, such as less than 6°, such as less than 5°, such as less than 4°, such as less than 3°, such as less than 2°, such as less than 1°. Thus, the audio processor can be used in connection with built-in loudspeakers of miniature mobile or portable equipment, e.g. a mobile phone which typically has its loudspeakers positioned with a small distance of a few cm and thus provides a narrow listening angle when held at a normal operation distance. Optionally, the audio processor is further adapted to apply a filtering in order to compensate an electro-acoustic characteristics of the first and second narrow-spaced loudspeakers. This filtering may be adapted to compensate a difference in electro-acoustic characteristics between the first and second narrow-spaced loudspeakers. E.g. it may be desirable to use said filtering to compensate for a non-flat magnitude response of the loudspeakers or to compensate a phase difference between the loudspeakers.

The audio processor may be adapted to receive the input set of audio signals in either an analog format or in a digital, e.g. PCM, I²S, AES/EBU or the like.

The audio processor may be used for normal stereo signals (i.e. "Blumlein" stereo signals adapted for normal loudspeaker reproduction) or for binaural signals (i.e. signals which has head related transfer functions applied thereon).

According to a second aspect, the invention provides a loudspeaker unit comprising

- an input adapted to receive first and second audio signals,

- an audio processor according to the first aspect,

- an amplifier adapted to generate amplified first and second outputs from the audio processor, and

- first and second narrow-spaced loudspeakers adapted to receive the amplified first and second outputs from the stereo widening device, respectively.

Such loudspeaker unit is especially suited for integration into mobile appliances, e.g. mobile phones.

According to a third aspect, the invention provides a method of processing an input set of first and second audio signals and generate a corresponding processed set of first and second set of audio signals adapted for playback via a set of respective first and second narrow-spaced loudspeakers to provide a listener with left and right ear signals, the method comprising the steps of:

- performing, within a pre-selected frequency range, a cross talk cancellation processing adapted to at least reduce a resulting cross talk from the input set of first and second audio signals to respective left and right ear signals, - applying a set of first and second frequency weightings to respective first and second input audio signals within the pre-selected frequency range, said first and second frequency weightings being substantially similar, and said first and second frequency weightings being selected so that the processed set of audio signals provides a listener with a perceived timbre being substantially the same as a perceived timbre provided by the input set of audio signals.

With respect to explanation of the features an advantages of the second aspect, the same explanation as to the first aspect applies.

In preferred embodiments, the method includes designing a filter involved in the cross talk cancellation processing by introducing a frequency dependent variable in order to avoid division by zero in a complex division. By introducing such variable, typical division by zero problems by design of cross talk cancellation filters can be avoided. These problems are especially pronounced by narrow-spaced loudspeaker listening setups.

The frequency dependent variable is preferably selected so as to avoid discontinuities in the first and second frequency weightings. Preferably, the frequency dependent variable is a continuous function of frequency. By using continuous functions of frequency, the resulting filter becomes more optimal from a psycho-acoustical point of view.

Preferably, the frequency dependent variable is introduced in an amplitude part of a complex division. Hereby, the phase response of the filter is left undistorted.

According to a forth aspect, the invention provides a mobile unit comprising an audio processor according to the first aspect. The mobile unit may be such as portable audio units or devices, and the mobile unit may be selected from the group consisting of: mobile phones, laptop computers, Personal Data Agents, game devices, portable MP3 players, portable radios, portable TV sets, portable CD players, portable DVD players, and small display units. The audio processor is suitable for mobile units since often such mobile units has a set of narrow-spaced loudspeakers that, without special processing, provides a poor perceived spatial sound image.

In a fifth aspect, the invention provides a computer executable program code adapted to perform the method according to the first aspect. Such computer executable program code may be adapted for being executed on a dedicated signal processor or on any other processor, preferably capable of real time processing the set of input set of audio signals. It is understood that the mentioned first, second, third, fourth and fifth aspects of the invention may each be combined with any of the other aspects. Thus, e.g. an audio processor according to the first aspect may comprise filters designed according to the mentioned preferred embodiment of the method of the third aspect, and e.g. a loudspeaker unit of the second aspect may likewise comprise features of any of the other aspects. These and other aspects of the invention will be apparent from and elucidated with reference to the embodiments described hereinafter.

Brief description of the drawings In the following the invention is described in more details with reference to the accompanying figures, of which

Fig. 1 shows a sketch of a symmetrical listening setup with two loudspeakers positioned to provide a listening angle θ,

Fig. 2 shows a block diagram of a prior art cross talk cancellation system,

Fig. 3 shows a block diagram of an audio processor embodiment,

Fig. 4 shows a block diagram of another preferred audio processor embodiment,

Fig. 5 shows a magnitude plot of a preferred frequency weighting to be applied in order to provide a natural timbre,

Fig. 6 shows still another audio processor embodiment,

Fig. 7 shows the embodiment of Fig. 6 further comprising a compensation filter adapted to equalize the loudspeakers,

Fig. 8 shows filter implementation of a preferred embodiment,

Fig. 9 shows a sketch of an audio processor according to the invention, indicating that it is capable of receiving and processing (normal) stereo signal as well as binaural signal without the need to change the processing,

Fig. 10 shows a sketch of an audio processor, a set of narrow-spaced loudspeakers and an intermediate storage medium adapted to store a pre-processed set of audio data, Hg. 11 shows a plot of an example of a continuous frequency variable α introduced in order to avoid division by zero in design of frequency weighting filters, and

Fig. 12 shows a preferred application of the audio processor according to the invention: a mobile unit, e.g. a mobile phone, with a set of narrow-spaced loudspeakers.

While the invention is susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and will be described in detail herein. It should be understood, however, that the invention is not intended to be limited to the particular forms disclosed. Rather, the invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the appended claims.

Description of preferred embodiments Fig. 1 shows a sketch of a preferred listening setup with a listener with left and right ears Ear_L, Ear_R in front of left and right loudspeakers L, R. The listening setup is preferably symmetrical, i.e. the listener is positioned right between the loudspeakers L, R so that it can be assumed that the ipsi-lateral transfer functions, i.e. sound transmissions L-Ear_L and R-Ear_R are similar. Correspondingly, contra-lateral transfer functions, i.e. sound transmissions L-Ear_R and R-Ear_L, are also similar. The sketch shows a definition of the "listening angle' θ. Preferred embodiments of the audio processor according to the invention are based on and therefore suited for a symmetrical listening setup with narrow- spaced loudspeakers L, R, i.e. loudspeakers spaced and at a distance from the listener so that the listening angle is less than 20°. The audio processor is suited for loudspeakers L, R providing a listening angle as low as 2-3°, or even lower such as 1-2°.

Where not indicated specifically, the principles described in the following also assume "ideal" loudspeakers, i.e. left and right loudspeakers are assumed similar and they are assumed to provide an "ideal" flat magnitude characteristics.

The audio processor embodiments that will be explained in the following are designed specifically with a symmetrical listening setup as a design target. However, it should be noted, that the general principles according to the invention are not exclusively valid for a design where a symmetrical listening setup has been used as a design target. On the contrary, the audio processor according to the invention is well suited for a listening situation where the listener is not precisely symmetrically positioned relative to the loudspeaker, e.g. due to head movements etc., even though it has been designed with a symmetrical listening setup as a target. For special applications it may be preferred to design the audio processor with an asymmetrical listening setup as target.

Fig. 2 shows the basic principles of a cross talk canceller system according to prior art. In Fig. 2 and in the following *z' indicates that signals illustrated are in the z-domain, i.e. in a frequency domain, and are in general complex. A binaural input signal L(z), R(z) is processed and processed output signals LS_R(z), LS_L(z) are applied and reproduced to a set of loudspeakers. Assuming a symmetrical listening setup as defined above, ipsi-lateral transfer functions A(z) and contra-lateral transfer functions B(z) illustrate sound transmission from the loudspeaker to left and right ears Ear_L(z), Ear_R(z) of the listener. With "ideal" loudspeaker, the relation between processor output and sound pressure at the ears is defined by:

^~Ear_L(z) A(Z) B(z) LS_L(z) _Ear_R(z) B(z) A(z) LSRM]

For proper reproduction of binaural signals, it is required that the input signals R(z), L(z) reach the listener's ears Ear_R(z), Ear_L(z), respectively. In order to obtain this, the cross- canceller processing part of Fig. 2, i.e. the transfer functions C(z) and D(z), must satify:

I.e. the cross talk canceller includes a complex matrix inversion. For most practical implementations C(z) and D(z) are based on measurements of A(z) and B(z) in the actual listening setup or synthesized based on database values of head related transfer functions. Since A(Z) and B(z) are acoustical transfer functions, it generally occurs that at certain narrow frequency ranges, magnitudes of both A(z) and B(z) are low, i.e. in notches of the transfer functions A(z) and B(z), the inversion of the above matrix includes a "divison by zero" or at least becomes close to a "division by zero" situation. This leads to several problems to be solved in order to achieve a proper sound quality of the cross talk canceller. For narrow-spaced loudspeakers, the transfer functions A(z) and B(z) are very similar, and the mentioned "division by zero" problem will also occur at frequency ranges where A(z) and B(z) have (almost) similar values. This is of course more likely to occur for narrow-spaced loudspeaker setups, thus a cross-talk canceller for narrow-spaced loudspeakers is difficult to design properly.

According to prior art, the cross talk canceller of Fig. 2 may be modified to also reproduce normal stereo signals. Since normal stereo signals are intended for playback on a set of stereo loudspeaker providing a listening angle of the order of 60°, the cross talk canceller of Fig. 2 may be supplemented by an input processor that adds head related transfer functions to the input signals, e.g. for angles +/-30°, in order to synthesize positions of virtual loudspeakers.

Fig. 3 illustrates an embodiment of an audio processor according to the invention. A set of input audio signals IN are processed in order to produce a set of processed output audio signals OUT that are suited for reproduction via a set of narrow-spaced loudspeakers. The input signals IN are applied to a band pass filter BPF and a band stop filter BSF with an inverse characteristics. The band pass filter BPF serves to provide a pre-selected frequency range of the input signals IN that are processed by a cross talk canceller CTC, whereas the remaining part of the signal outside the pre-selected frequency range is not further processed and it is just added to the cross talk cancelled part of the input signals IN to generate the final output signals OUT. A sum of the band pass filter BPF and the band stop filter BSFshould ideally be 1, i.e. BSF could be calculated as 1-BPF. BPF and BSF can be implemented in a number of ways, i.e. minimum phase, linear phase etc.

Frequency weighting means FWM is applied in the signal branch after the band pass filter BPF. The frequency weighting means FWM serves to apply a frequency weighting to the band pass filtered signal so that the output signals OUT will provide a listener with a perceived timbre that is substantially the same as a perceived timbre of the input signals IN. It is to be understood that the frequency weighting means FWM may be applied in other positions in the signal path, such as after the cross talk canceller CTC.

The embodiment of Fig. 3 has several advantages over prior art. The audio processor is especially suitable for reproduction via very narrow-spaced loudspeakers, listening angle of such as 1-20°. By proper selection of the pre-selected frequency range and the frequency weighting, as will be described in the following, the processor is capable of reproducing stereo signals as well as binaural signals. The pre-selected frequency range is preferably the frequency range 1,5-18 kHz. With this choice, the cross talk canceller CTC is active in the most important frequency range with respect to achieve a psycho-acoustical Iy spatial image widening. In addition, the aforementioned problem by matrix inversion is reduced. At low frequencies the difference between the ipsi-lateral A(z) and contra-lateral B(z) transfer functions is small, and at high frequencies the difference between the ipsi-lateral A(z) and contra-lateral B(z) transfer functions fluctuate rapidly with frequency.

Fig. 4 show a preferred embodiment where the cross talk canceller CTC of Fig. 3 is implemented as the one of Fig. 2, i.e. as known in prior art, namely by the four transfer functions C(z), C(z), D(z) and D(z) that are applied to pre-equalized version EQ_R(z), EQ_L(z) of the input signals R(z), L(z). Note that implementation of the pre-selected frequency range is not explicitly shown in Fig. 4. The frequency weighting means adapted to provide the correct perceived timbre within the pre-selected frequency range, is implemented by the transfer function E(z) that is applied to both input signals R(z), L(z). In a preferred embodiment, the transfer function E(z) is given by:

i.e. preferably, E(z) is based on both the ipsi-lateral A(z) and the contra-lateral B(z) transfer functions. This preferred choice of E(z) will ensure that a perceived loudness within the pre-selected frequency range and a perceived loudness outside the pre-selected frequency range will be the approximately the same. Here it is assumed that a perceived binaural loudness summation can be approximated by a simple left and right power summation.

For very narrow listening angles, such as 1-4°, the ipsi- and contra-lateral transfer functions are quite similar, and thus in such situations E(z) may be approximated by E(z)=A(z)+3 dB or by E(Z)=B(z)+3 dB, the 3 dB gain factor being added in order to provide a proper level, i.e. a level comparable with the above formula with the most preferred choice of E(z). For less demanding applications, such approximations may be used also for wider listening angles.

Fig. 5 shows an example of a magnitude plot of a preferred E(z) as a function of frequency. The transfer function E(z) shown in Fig. 5 has been determined according to the above formula based A(z) and B(z) for a listening angle of 4°. As seen, the frequency weighting has a substantial amplification especially in the frequency range 1,5-6 kHz with a level of up to around 20 dB, and thus the presence of E(z) has a highly significant audible effect.

Fig. 6 shows a block diagram of a preferred embodiment. Input signals R(z), L(z) are first processed by a combined filter means G(z) that serves two purposes. Within the preselected frequency band G(z) forms a part of the cross talk canceller and in addition it serves to apply the frequency weighting with the purpose of providing a correct perceived timbre. Outside the pre-selected frequency range input signals are only processed by A(z), and thus here G(z) serves to counteract this processing in order to provide a neutral processing outside the pre-selected frequency range from input signals R(z), L(z) to respective output signals LS_R(z), LS_L(z). The signals equalized by G(z) are denoted EQ_R(z) and EQ_L(z). These equalized signals EQ_R(z), EQ_L(z) are then cross talk cancelled. The cross talk cancellation part is implemented with two ipsi-lateral transfer functions A(z) and two contra-lateral transfer functions B(Z), i.e. the transfer functions from the listening situation as sketched in Fig. 2. Note that the output from B(z) is subtracted at the summation points. The output of the summation points forms the processed signals LS_R(z), LS_L(z) that are to be applied to a narrow-spaced set of loudspeakers. The equalized signals EQ_R(z), EQ_L(Z) are applied to band pass filters F(z) that serve to ensure that the equalized signals EQ_R(z), EQ_L(Z) are fed to cross feed signal paths of the cross talk canceller in the preselected frequency range only. F(z) preferably has a gain of 0 dB in the pre-selected frequency range, preferably 1,5-18 kHz.

The combined filter G(z) preferably has its pass band aligned with the pre-selected frequency range. Preferably, within the pre-selected frequency range, i.e. 1,5-18 Hz, G(z) is given by:

Outside the pre-selected frequency range, i.e. both at frequencies below and at frequencies above the pre-selected frequency range, G(z) is preferably given by:

G(z) = *

A(z)

The equalizing filter G(z) is preferably implemented as either FIR or HR filters.

According to the preferred embodiments the crosstalk cancellation effect and natural timbre is only the target inside the pre-selected frequency range, e.g. from 1,5 kHz to 18 kHz. Outside this frequency range "by pass" of the cross talk canceller is the target, i.e. output = input which means no cross feed filters B(z) and neutral Left-Left and Right-Right filters.

One example of a band pass filter F(z) could be a 10. order Linkwitch Riley, i.e. two 5. order Butterworth filters in series. The roll off both at the lower limit and the upper limit should be 10. order.

Fig. 7 is another preferred embodiment based on the embodiment of Fig. 6 but with optional additional filters K(z) and L(z) adapted to compensate for electro-acoustical non- ideal behaviour of the right and left loudspeakers, respectively. E.g. K(z) and L(z) may serve to compensate for non-flat magnitude response with the purpose of obtaining a flat response within a target frequency range. The filters K(z), L(z) may alternatively or additionally compensate a phase response of the loudspeaker(s), especially a phase difference between left and right loudspeaker may be compensated so as to ensure that resulting electro-acoustical characteristics of the left and right loudspeakers are similar, i.e. symmetrical.

In general, if loudspeaker characteristics are denoted Loudspeaker(z), then the filters K(z), L(z) may be defined by: K(z) = and

Loudspea kev_Right (z)

L(z) = .

Loudspeaker ^ (z)

For small loudspeakers, such as in miniature mobile equipment, e.g. a mobile phone, the filters K(z), L(z) may be designed in order to limit the applied gain. E.g. K(z) and L(z) may be adapted to provide a high pass effect in order to protect the loudspeakers against high electrical input voltages at low frequencies, such as below 300 Hz or 400 Hz.

Fig. 8 shows a diagram of a preferred implementation of the embodiment of Fig. 6 taking into account practical filter design. Thus, Fig. 8 shows an embodiment where the 8 separate filters of Fig. 6 are combined so that only four separate filters are required for the same function. This is advantageous with respect to practical implementation on a signal processor. Only two different combined filters H(z), I(z) are required for each channel. In a most preferred embodiment the same resulting signal processing as described for the embodiment of Fig. 6 is obtained if these combined filters H(z), I(z) are selected as:

H(z) = G(z) * A(z) , and

/(z) = G(z) . F(z) . (-B(z)) ,

where A(z), B(z), F(z) and G(z) refer to the above definitions.

Fig. 9 shows a sketch of the audio processor according to the invention. The sketch serves to illustrate that the audio processor according to the invention is suitable for reproduction of stereo signals as well as binaural signals. Thus, the audio processor can be used in applications where the nature of a set of input audio signals is unknown and where it is a requirement that the audio processor must be able to provide a high sound quality and utilize the full spatial potential of the input signals no matter which type they are. E.g. when the audio processor is used in a mobile phone, ring tones may be binaural recordings or binaurally synthesized sounds intended to provide the listener with a true 3D spatial effect. However, still the mobile phone may be used to play music files, e.g. in MP3 format, such music files being normal stereo signals without head related transfer functions added thereon. The audio processor according to the invention is capable of providing an optimal spatial reproduction via narrow-spaced loudspeaker for both binaural signals and stereo signals without the need for any manual or automatic switch of processing parameters. Thus, the audio processor is simple to operate and it is simple to implement.

Fig. 10 shows a sketch of the audio processor according to the invention where a set of input signals IN are processed and a pre-processed version of the input signals are stored on a storage medium, e.g. a CD, a DVD, a hard disc, a memory card or computer memory and the like. After being stored for a period and for example transported, the stored pre- processed signals can be reproduced via a set of narrow-spaced loudspeakers. The storage medium may be a computer, e.g. a server connected to the Internet. For applications where an intermediate storage medium is acceptable or even desirable, it is possible to off-line process audio signals with the audio processor according to the invention. Hereby, there is no need for a small scale signal processor capable of performing real time processing. The processing may instead be performed on a large powerful computer.

In connection with storage on a storage medium, the pre-processed signals may be compressed using audio compression techniques such as known in the art, in order to save storage space.

If pre-processed audio signals are be stored in a file on a server connected to the Internet, a user can down-load the file from the Internet for use his/her mobile miniature equipment with narrow-spaced loudspeakers. For example music files, ring tones for mobile phones etc. may be pre-processed and subsequently down-loaded by users of the mobile equipment that can play the files utilizing the spatial image enhancement without the need for an audio processor present in their mobile equipment.

Fig. 11 serves to illustrate an example of a frequency depending variable α(z) introduced during design of filters according to a preferred audio processing method. The variable α(z) may be introduced during design of cross talk cancellation filters C(z) and D(z) with reference to Figs. 4 with the purpose of limiting the aforementioned "divison by zero" problems occurring by matrix inversion, i.e. a numerical robust method. A standard calculation of the cross talk cancellation filters C(z) and D(z) are preformed as: A(z)

C(z) = and A(z) • A(z)-B(z)_*B(z) '

D(z) = - -B(z)

A(z)_* A(z) -B(z)_* B(z)

According to a preferred method, separate calculations of phase and amplitude are performed. When calculating the amplitude, a small number α(z) is added to the denumerator of the fraction involved in the calculation. Since α(z) is only involved in the amplitude part of the calculation, the introduction of α(z) does not lead to any phase distortion. Thus, according to a preferred method C(z) and D(z) are calculated as:

\A(z) _*A(z)-B(z)_* B(z)y

Here it should be noted that the first term in the denominator is a complex number with an amplitude of 1, i.e.:

A(z) »A(z)-B(z) »B(z)

= 1 ,

\A(Z) » A(Z) - B(Z) * B(Z)\

and the second term in the denominator is a real number, R, i.e. R has a phase of zero degrees:

(j-4(z) • A(z) - B(z) • B(z)\ + <X(Z))G R .

Preferably, α(z) is chosen depending of frequency so that it is zero except for frequency regions where the term \\A(z) • A(z) - B(z) • B(z)\) is zero or is below a predefined threshold. In such frequency regions α(z) is chosen such that a division by zero or a division by "close to zero" is avoided, and thus the resulting filter will exhibit a more continuous function of frequency without severe peaks that will be critical from a psycho- acoustical point of view. Thus, even though A(z) and B(z) are numerically close to each at some frequency regions due the narrow-spaced loudspeaker listening setup, introduction of α(z) serves to remedy any negative effect during design of the filters.

The example of α(z) shown in the graph of Fig. 11 is an example for a listening angle of 4°, and it is seen that α(z) is zero at lower frequencies but has numerical values up to 0,20 where it serves to force the mentioned denominator to have a value different from zero at a frequency around 7 kHz.

As is also illustrated by the graph of Fig. 11, and important feature of α(z) is that it is continuously depending on frequency. A continuous frequency dependence of α(z) helps to provide a resulting filter which has also a more smooth frequency dependence and thus results in a better sound quality than an abrupt characteristics with peaks and dips.

For calculation of the filter G(z) as in the embodiment of Fig. 6, the scheme changes accordingly, but the same strategy for choice of α(z) as mentioned above applies. For calculation of G(z) in the pass band, i.e. in the pre-selected frequency range, α(z) is introduced as:

The method of adding a small number α(z) in the denominator may also be used outside the pre-selected frequency range, i.e. in the pass band of the crosstalk cancellation system (when calculating G(z) as the inverse of A(z)), however this is usually not necessary.

Fig. 12 shows a sketch preferred application of the audio processor according to the invention, namely for use in a mobile unit, e.g. a mobile phone or the like. In Fig. 12 a listener sketched in front of a mobile unit with loudspeakers placed so as to radiate sound from the front or sides of the mobile unit housing. Sound transfer from the loudspeakers to the listener's ear, including cross talk, are indicated with arrows. The audio processor according to the invention is preferably built into the mobile unit. This may be done in a number of ways, i.e. either as a stand-alone audio processor unit implemented in hardware and inserted in the audio signal path of the mobile unit, e.g. between its main signal processor and a power amplifier driving the loudspeakers. The audio processor may also be implemented partly or fully integrated with the main signal processor of the mobile unit. The audio processor may be fully implemented in software executed by the main signal processor of the mobile unit. In order to further enhance a spatial sound image, mobile phones with loudspeakers positioned in top and bottom parts of their housing may be preferred since in this way loudspeaker distance is increased and thus the listening angle is increased.

Yet another alternative to implement the audio processor according to the invention in a mobile unit is according to an aspect of the invention, namely to integrate the audio in a loudspeaker unit. Preferably, such loudspeaker unit has a set of narrow-spaced loudspeakers and a proper power amplifier driving them, and wherein the audio processor is inserted in a signal path before the power amplifier and thus the audio processor serves as an input processor of the loudspeaker unit. Preferably, the loudspeaker unit has a housing with dimensions suited for the mobile unit so that acoustical openings of the loudspeakers housing are fitted to corresponding opening of the mobile unit casing, e.g. so that the acoustic outputs from the loudspeakers are positioned on opposite sides of the mobile unit, or positioned on opposite part of a front part of the mobile unit casing, e.g. on opposite sides of the display of the mobile unit.

In preferred embodiments of the loudspeaker unit, the audio processor has fixed filter parameters chosen specially to suit an assumed listening angle with the actual distance between the loudspeakers of the loudspeaker unit, assuming a reasonable listener distance to the loudspeakers.

The loudspeaker unit may be adapted to receive a digital or an analog input audio signal. In digital embodiments, the loudspeaker unit may be adapted to receive a digital input signal according to a digital format as known in the art. Hereby, the loudspeaker unit can be integrated with a mobile unit which is predominantly based on digital audio signals. The loudspeaker unit may optionally comprise a "subwoofer" loudspeaker adapted to assist the set of narrow spaced loudspeakers at low frequencies. Such "subwoofer" may be positioned with its acoustical opening inside the mobile unit casing.

Listening tests with preferred embodiments of the audio processor has proven that a sound quality by reproduction via narrow-spaced loudspeaker is significantly improved compared to prior art processors. The improvement is both with respect to spatial aspects and with respect to providing a neutral timbre. Due to the design of the audio processor, it is equally well suited for providing an image widening effect of stereo signals and for providing a 3D image effect by reproduction of binaural signals.

Even by reproduction via the built-in loudspeaker of today's miniature mobile phones held at a distance of approximately 40 cm, a considerable image width is obtained by reproduction of a normal stereo signal, and still without suffering from the unnatural timbre of prior art processors. This is partly obtained by contradicting prior art teachings by avoiding additional head related transfer functions for synthesizing virtual loudspeaker positions. Partly, the general high sound quality of the processor is obtained by proper choice of frequency weightings. An important effect of the technical features of the audio processor is that it can be used for playback of both stereo signals and binaural signals without any negative effect.

In the claims, the term "comprising" does not exclude the presence of other elements or steps. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. In addition, singular references do not exclude a plurality. Thus, references to "a", "an", "first", "second" etc. do not preclude a plurality.

Reference signs in the claims merely serve to increase readability. These reference signs should not in anyway be construed as limiting the scope of the claims, but are only included illustrating examples only.

Claims

1. An audio processor suited for processing an input set of first and second audio signals and generate a corresponding processed set of first and second audio signals adapted for playback via a set of respective first and second narrow-spaced loudspeakers to provide a listener with left and right ear signals, the audio processor comprising

- frequency weighting means adapted to apply a set of first and second frequency weightings are applied to respective first and second input audio signals, within the pre-selected frequency range, said first and second frequency weightings being substantially similar, and said first and second frequency weightings being selected so that the processed set of audio signals provides a listener with a perceived timbre being substantially the same as a perceived timbre provided by the input set of audio signals.

2. Audio processor according to claim 1, wherein the first and second frequency weightings are based on a magnitude of an ipsi-lateral transfer function.

3. Audio processor according to claim 2, wherein the first and second frequency weightings approximately simulates the magnitude of the ipsi-lateral transfer function combined with a gain factor.

4. Audio processor according to claim 1, wherein the first and second frequency weightings are based on a magnitude of a contra-lateral transfer function.

5. Audio processor according to claim 4, wherein the first and second frequency weightings approximately simulates the magnitude of the contra-lateral transfer function combined with a gain factor.

6. Audio processor according to claim 1, wherein the first and second frequency weightings approximately simulates a square root of sum of squares of magnitudes of ipsi-lateral and contra- lateral transfer functions.

7. Audio processor according to claim 1, wherein the pre-selected frequency range has lower and upper cut-off frequencies in the ranges 0,5-3 kHz and 10-20 kHz, respectively.

8. Audio processor according to claim 7, wherein the lower and upper cut-off frequencies 5 are in the ranges 1,0-2,0 kHz and 16-20 kHz, respectively.

9. Audio processor according to any of the preceding claims, further comprising a band pass filter adapted to band pass the first and second input signals prior to being applied to the cross talk canceller.

10

10. Audio processor according to any of claims 1-8, further comprising a band pass filter positioned in a cross feed signal path of the cross talk canceller.

11. Audio processor according to any of the preceding claims, wherein the audio processor 15 is adapted for use with a set of narrow-spaced loudspeakers positioned relative to a listener to provide a listening angle of less than 6°.

12. Audio processor according to claim 9, wherein the listening angle is less than 4°.

20 13. Audio processor according to any of the preceding claims, further being adapted to apply a filtering in order to compensate an electro-acoustic characteristics of the first and second narrow-spaced loudspeakers.

14. Audio processor according to claim 13, wherein the filtering is adapted to compensate 25 a difference in electro-acoustic characteristics between the first and second narrow-spaced loudspeakers.

15. Audio processor according to any of the preceding claims, the audio processor being adapted to receive the input set of audio signals in a format selected from the group

30 consisting of: analog format, and digital format.

16. Use of the audio processor according to any of the preceding claims, for first and second audio input signals being selected from the group consisting of: normal stereo signals, and binaural signals.

35

17. A loudspeaker unit comprising

- an input adapted to receive first and second audio signals, - an audio processor according to any of claims 1-15,

5

18. Method of processing an input set of first and second audio signals and generate a 10 corresponding processed set of first and second set of audio signals adapted for playback via a set of respective first and second narrow-spaced loudspeakers to provide a listener with left and right ear signals, the method comprising the steps of:

- performing, within a pre-selected frequency range, a cross talk cancellation processing 15 adapted to at least reduce a resulting cross talk from the input set of first and second audio signals to respective left and right ear signals,

- applying a set of first and second frequency weightings to respective first and second input audio signals within the pre-selected frequency range, said first and second

20 frequency weightings being substantially similar, and said first and second frequency weightings being selected so that the processed set of audio signals provides a listener with a perceived timbre being substantially the same as a perceived timbre provided by the input set of audio signals.

25 19. Method according to claim 18, wherein a filter involved in the cross talk cancellation processing is designed by introducing a frequency dependent variable in order to avoid division by zero in a complex division.

20. Method according to claim 19, wherein the frequency dependent variable is a 30 continuous function of frequency.

21. Method according to claim 19, wherein the frequency dependent variable is selected so as to avoid discontinuities in the filter involved in the cross talk cancellation processing.

35 22. Method according to any of claims 19-21, wherein the frequency dependent variable is introduced in an amplitude part of a complex division.

23. A mobile unit comprising an audio processor according to any of claims 1-15.

24. Mobile unit according to claim 23, the mobile unit being selected from the group consisting of: mobile phones, laptop computers, Personal Data Agents, game devices, portable MP3 players, portable radios, portable TV sets, portable CD players, portable DVD players, and small display units.