[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO1998042161A2 - Telephonic transmission of three-dimensional sound - Google Patents

Telephonic transmission of three-dimensional sound Download PDF

Info

Publication number
WO1998042161A2
WO1998042161A2 PCT/GB1998/000813 GB9800813W WO9842161A2 WO 1998042161 A2 WO1998042161 A2 WO 1998042161A2 GB 9800813 W GB9800813 W GB 9800813W WO 9842161 A2 WO9842161 A2 WO 9842161A2
Authority
WO
WIPO (PCT)
Prior art keywords
signal
signals
output signals
channel
khz
Prior art date
Application number
PCT/GB1998/000813
Other languages
French (fr)
Other versions
WO1998042161A3 (en
Inventor
David Monteith
Alastair Sibbald
Martin Peter Todd
Original Assignee
Central Research Laboratories Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GBGB9705565.1A external-priority patent/GB9705565D0/en
Priority claimed from GBGB9707962.8A external-priority patent/GB9707962D0/en
Application filed by Central Research Laboratories Limited filed Critical Central Research Laboratories Limited
Priority to EP98909666A priority Critical patent/EP0968624A2/en
Publication of WO1998042161A2 publication Critical patent/WO1998042161A2/en
Publication of WO1998042161A3 publication Critical patent/WO1998042161A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems

Definitions

  • This invention relates to telephonic transmission of three dimensional (3D) sounds and more particularly to an apparatus for communicating three dimensional sounds between two or more remote locations by telephone transmission.
  • the present invention is concerned with telephone conference systems irrespective of whether or not a visual image is transmitted at the same time as the audio transmission.
  • Video telephone conference systems which employ large expensive equipment to enable a large group of people in one location to communicate with another group at another location is well known.
  • Video telephones which incorporate a camera and video screen at each location are also well known.
  • office technologies such as fax, telephone and video such systems are becoming more readily available.
  • An object of the present invention is to overcome, or reduce, these undesirable effects by reproducing a three dimensional sound field of the transmitting station at the receiving location.
  • Binaural technology is based on using a so-called "artificial head” microphone system to receive sound from a sound source and convert the acoustic energy into an electrical signal which is subsequently processed digitally.
  • the use of an artificial head ensures that the natural three dimensional sound cues, which the brain of a listener uses to determine the position of sound sources in three dimensional space, are incorporated into the audio signal.
  • the artificial head is preferably constructed to resemble as close as possible an actual human head and upper torso and has silicone rubber ears which precisely resemble human ears but in some applications good results (but less precise) can be achieved using two spaced microphones with a block or sheet of wood between the microphones.
  • binaural signals is intended to mean two channel or stereophonic signals which include one or more components representing audio diffraction effects created by an artificial head means positioned between a pair of microphones.
  • artificial head is intended to cover not only a precise model of a human head but other imprecise models (such as for example a block of wood between microphones) and electrical synthesis of the audio diffraction signals.
  • a further problem with artificial - head microphone systems is that when listening to the reproduced sound through loudspeakers interaural cross talk occurs, when an audio signal intended for one ear of a listener is also received by the other ear. In order to compensate for this effect it is well known to employ cross-talk cancellation circuits. See for example International Patent Application WO-A-9515069.
  • a further object of the present invention is to provide apparatus which enables binaural processing of the audio signals of a telephone conference system.
  • apparatus for communicating three dimensional sounds via a telephone link comprising an input device consisting of two spaced microphones operable to produce left and right channel monophonic microphone output signals, signal processing means for each channel comprising filter means for receiving the microphone output signals and modifying the signals to compensate for head related air-to-ear transfer functions and equalise the spectral response of the microphone output signals, cross-talk cancellation means for cancelling out interaural cross-talk between the channels, and data compression means operable to receive an output signal from each channel, combine them to produce a binaural signal and compress said binaural signal to produce a compressed binaural signal for transmission over the telephone link, said compression means using a first compression algorithm to compress frequencies below 1 kHz whilst preserving relative phase differences between the channel output signals, a second algorithm to compress frequencies above 2 kHz whilst preserving relative differences between amplitudes of the channel output signals and a third algorithm to compress frequencies between 1 kHz and 2 kHz whilst preserving the IAD and ITD
  • the apparatus further includes a receiving means for receiving a compressed binaural signal transmitted over a telephone link and converting said compressed signal into left and right channel audio output signals, and spaced left and right channel sound reproduction means each of which is operable to receive a respective channel audio output signal from said receiving means and reproduced sound corresponding to said respective channel audio output signal.
  • a receiving means for receiving a compressed binaural signal transmitted over a telephone link and converting said compressed signal into left and right channel audio output signals
  • spaced left and right channel sound reproduction means each of which is operable to receive a respective channel audio output signal from said receiving means and reproduced sound corresponding to said respective channel audio output signal.
  • the sound reproduction means may comprise a pair of loudspeakers, or a pair of headphones.
  • the apparatus may be provided with a video signal means comprising a camera operable to produce a video output signal, and the compression means is operable to receive the video output signal and to combine said video output signal with said compressed binaural signal to produce a combined output signal for transmission via the telephone link.
  • a video signal means comprising a camera operable to produce a video output signal
  • the compression means is operable to receive the video output signal and to combine said video output signal with said compressed binaural signal to produce a combined output signal for transmission via the telephone link.
  • receiving means further includes means for receiving a video signal transmitted over a telephone link and converting the video signal into a video output signal, and display means operable to receive said video output signal and display a visual image.
  • Figure 1 illustrates schematically apparatus incorporating the present invention for telephone conference connection between two conference centres.
  • Figure 2 shows in block diagram form apparatus incorporating signal processors in accordance with the present invention.
  • Figure 3 shows schematically human head, and
  • Figure 4 shows a further embodiment of the present invention.
  • each conference station 10, 11 is provided with a personal computer (PC) 12 which includes a monitor, two spaced microphones 13, 14 mounted in silicone rubber moulded ears 15 (which model precisely human outer ears) and two spaced loudspeakers 16.
  • PC personal computer
  • the microphones 13, 14 should ideally be placed about 15 cm apart (the approximate width of a human head) and although it is preferable that the microphones are mounted in moulded ears 15 on an artificial head, the microphones could be mounted in moulded ears mounted on structure 17 (such as a block or sheet of wood). Alternatively the microphones 13, 14 and moulded ears could simply be mounted on the sides of the computer case 12, but this would give less precise detail to the three-dimensional sound field.
  • Each of the stations 10, 11 is connected to the other by means of the public telephone system 27 in the usual way.
  • both microphones 13, 14 are positioned to receive sound generated at their respective station 10, 11, where they are located.
  • Each microphone converts the pressure variations associated with the sound waves that it receives into an analogue electrical signal at inputs 18a, 18b of each channel (representing left and right ears 13,14) of a digital signal processor 19.
  • the processor 18 comprises a HRTF filter 20 and an equalisation filter 21 for each channel.
  • HRTF or "Head Related Transfer Function” is intended to mean a function representing the transfer function of a path between a source of sound and the ear of the listener, either the ear nearer the sound (near HRTF) or the ear further from the sound (far HRTF).
  • HRTF's may be obtained by measurements on a real human head equipped with suitable microphones; alternatively, they may be obtained using an artificial head means, which may be, as is common, a precise model of a human head or torso with microphones in the ear structures; alternatively it may be something far less precise, for example a block or sheet of wood positioned between a pair of spaced apart microphones; it might even be an electrical synthesis circuit or system which creates such functions.
  • an artificial head means which may be, as is common, a precise model of a human head or torso with microphones in the ear structures; alternatively it may be something far less precise, for example a block or sheet of wood positioned between a pair of spaced apart microphones; it might even be an electrical synthesis circuit or system which creates such functions.
  • Filters 21 correct the spectral response to compensate for the mid-range gain associated with the concha-related resonance, as explained in International Patent Applications WO-A- 9422278 and WO-A-9515069.
  • the outputs 21a, 21b of the filters 21 are fed to cross-talk cancellation circuits 22 which cancel out the interaural crosstalk as explained in International Patent Applications WO-A-9422278 and WO-A-9515069.
  • the output signals at each channel output 23 comprises a monophonic digital audio signal.
  • the normal signals transmitted over internationally acceptable telephone networks are typically a monophonic signal covering a range of frequencies from about 200 Hz to 3.4 kHz.
  • the output signals 23 of each channel are combined and compressed by a signal compression means 25 to produce a stereophonic output signal 24.
  • the compression algorithms used by the compression means 25 are designed to preserve the three dimensional cues in the audio output signals 23 from each channel.
  • a second key aspect is to preserve the time relationship between the signals in the two channels.
  • the manner in which the head and outer ears of a listener modify soundwaves before they are registered by the inner ears is complex, with several contributing factors playing a part.
  • each pinna outer ear flap
  • each pinna together with its auditory canal
  • the sound source is moved to one side of the head of the listener, then the more distant ear lies in the shadow of the head, and the ear closer to the sound source is aligned more on-axis with the source.
  • the soundwaves.diffract around the listener's head When sound waves encounter the listeners head, the soundwaves.diffract around the listener's head. In general, the average width of a human head is 15 cm with an interaural path length of about 20 cm when the circumference effect is taken into account. Sound waves of greater wavelength than 15 cm (corresponding to frequencies below about 1.7 kHz) can diffract efficiently around a human head whereas at higher frequencies the sound wave cannot diffract efficiently around the head. This effect, known as "head-shadowing", creates differences in amplitudes of the sound signals arriving at each ear of the listener. This interaural amplitude difference (IAD) is one of the primary 3D cues which need to be preserved.
  • IAD interaural amplitude difference
  • the effects of diffraction on the intensity of the sound are noticeable in the range of between 700 Hz and 8 kHz and are more noticeable at higher frequencies (say above 2 kHz), where the head-shadowing creates noticeable differences in the intensities of the sound waves reaching the ears.
  • the listener's brain uses these differences in intensity as cues to locate the direction of the source of high frequency sounds. Therefore it is important to retain the relationship between the intensities (or amplitudes) of the high frequency sounds.
  • phase difference is approximately proportional to frequency.
  • the listener's brain therefore uses the phase differences of the low frequencies as an important cue to determine the direction of the source of low frequency sounds. It is therefore important to retain the phase relationships between the output signals of the left and right channels for the low frequency sounds.
  • time-of-arrival differences between the left and right ears of the listener, unless the sound source is exactly in front, behind, above or below the head of the listener.
  • ITD interaural time delay
  • Figure 3 shows a plan view of a conceptual head with a left ear (LE) and a right ear (RE) receiving a sound signal from a distant source at azimuth angle ⁇ (about +45° as shown in the drawing).
  • the wave front (W - W 1 ) arrives at the right ear (RE)
  • the path distance a represents a proportion of the circumference subtended by ⁇ .
  • the path length (a+b) is given by.
  • ITDs are measured to be slightly greater than this, possibly because of the non-spherical nature of human heads, the complex diffractive situation and surface effects. Hence ITDs lying in the range of 0 to 0.8 ms are also important primary 3D cues.
  • the mid-range gain due to the concha related resonance and the resonance in the auditory canal of the outer ear occurs at about 3 kHz or slightly higher and this is at the extreme end of the normal bandwidth of conventional telephone transmission lines.
  • the Fossa a cavity at the uppermost region of the Pinna of the outer ear
  • the brain of the listener makes use of the higher frequency sounds at 13 kHz or above to assist in determining whether the source of sound is in front of or behind the listener. It is therefore important to retain the detail of high frequency sounds above 13 kHz, if front and back cues are necessary.
  • the compression means 25 uses a first algorithm which allows compression of frequencies below 1 kHz, whilst preserving phase differences between the channel output signal, and uses a second algorithm to compress frequencies above 2 kHz, whilst preserving relative differences in the amplitudes of the channel output signals 23.
  • the compression means 25 also employs algorithms that allow the compression of the mid range frequencies, whilst preserving the IAD and ITD information over the whole frequency band.
  • the compression means 25 thus preserves the phase and amplitude relationships up to 8 kHz for reproducing three dimensional. sound fields without front and back cues, or up to 13 kHz, or above, when front and back cues are wanted.
  • the output signal 24 of the compression means 25 is a compressed binaural signal which is transmitted over a conventional public telephone link 27 to another receiving station 10, 11.
  • Each station 10, 11 further includes a receiving means 28 for receiving an incoming compressed combined binaural signal transmitted via the telephone link 27.
  • the receiving means 28, (see Figure 2), comprises a signal processor which operates to re-expand the incoming compressed signal 26 and produce two channel input signals 30.
  • Each channel input signal 30 is supplied to a sound reproduction device 16 which may be the pair of loudspeakers 16 or a pair of headphones 32.
  • the apparatus of Figures 1, 2, and 3 further includes means for transmitting and receiving video signals over a telephone link 27 as shown in Figure 4 .
  • Figure 4 the same reference numbers are given to the same components that are common to the Figure 2 embodiment.
  • each station 10, 11 is provided with a video camera 32 and video processor 33 which is operable to produce a video output signal 34.
  • the video output signal 34 from the camera 32 is supplied to the compression means 25 of the signal processor 19 (see Figure 4).
  • the compression means 25 includes circuits for combining the binaural output signal 24 with the video output signal 34 to produce a combined video and binaural output signal 36 for transmission over the a telephone link.
  • the apparatus is also provided with a receiving means 37 for receiving an incoming combined video and binaural signal 38 transmitted over the telephone link 27 from another remote conference centre 10 or 11.
  • the receiving means 37 includes a decompression means 39 for expanding the received video and binaural signal 38, and operates to produce a video signal 40 to a video processor 41 and two audio output signals 30.to the speakers 16 or headphones 32.
  • the output of the video processor 40 drives the monitor 12 to produce a visual image.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)

Abstract

The invention relates to telephonic transmission of 3D sound. Existing video conferencing systems suffer from the disadvantage that following transmission of a person speaking, the speaker's voice tends to become 'disembodied'. That is, if a person moves with respect to a microphone, the reproduced voice tends not to move with the speaker. The invention overcomes or reduces this effect by obtaining left and right monophonic signals, modifying the signals to compensate for head related air-to-ear transfer functions and performing equalisation and cross-talk cancellation on the signals. Eventually signals are compressed to produce a compressed binaural signal for transmission along a telephone link so that frequencies are split into separate bands, but relative phase differences between signals in different frequency bands are preserved. 3D sounds are therefore able to be transmitted via telephone links, and reproduced more effectively, than was previously possible.

Description

TELEPHONIC TRANSMISSION OF 3D SOUND
This invention relates to telephonic transmission of three dimensional (3D) sounds and more particularly to an apparatus for communicating three dimensional sounds between two or more remote locations by telephone transmission. The present invention is concerned with telephone conference systems irrespective of whether or not a visual image is transmitted at the same time as the audio transmission.
The concept of video telephone conference systems which employ large expensive equipment to enable a large group of people in one location to communicate with another group at another location is well known. Video telephones which incorporate a camera and video screen at each location are also well known. With the advent of low cost video telephone systems for personal computers and the integration of office technologies, such as fax, telephone and video such systems are becoming more readily available.
One of the problems of telephone conference systems, particularly those without the transmission of visual images, is that when one is listening to a person speaking at the remote location, the voice of the person speaking seems to be "disembodied". With those systems that transmit visual images as well as the sound, this undesirable effect is more noticeable because, if the person speaking moves about at the remote location relative to the microphone or microphones monitoring the speaker's voice, the voice does not appear to move with the speaker. In those systems where the microphones are voice-actuated any slight noise or speech from one person can switch off the microphone of another person so that the listener becomes confused as to who is speaking.
These undesirable effects are related to the fact that all voices and any background sounds are localised at the same loudspeaker at the receiving station. An object of the present invention is to overcome, or reduce, these undesirable effects by reproducing a three dimensional sound field of the transmitting station at the receiving location.
The processing of binaural signals to produce a highly realistic three dimensional sound image, is well known, and is described in International Patent Application No WO-A- 9422278. Binaural technology is based on using a so-called "artificial head" microphone system to receive sound from a sound source and convert the acoustic energy into an electrical signal which is subsequently processed digitally. The use of an artificial head ensures that the natural three dimensional sound cues, which the brain of a listener uses to determine the position of sound sources in three dimensional space, are incorporated into the audio signal. The artificial head is preferably constructed to resemble as close as possible an actual human head and upper torso and has silicone rubber ears which precisely resemble human ears but in some applications good results (but less precise) can be achieved using two spaced microphones with a block or sheet of wood between the microphones.
For the purpose of the present specification the term "binaural signals" is intended to mean two channel or stereophonic signals which include one or more components representing audio diffraction effects created by an artificial head means positioned between a pair of microphones. The term "artificial head" is intended to cover not only a precise model of a human head but other imprecise models (such as for example a block of wood between microphones) and electrical synthesis of the audio diffraction signals.
There are many problems associated with artificial head sound recordings. For example, because the sound passes through two sets of ears (those of the artificial head and those of the listener) the tonal qualities of the reproduced sounds are not true to life. There is generally a resonance at a frequency of several kHz created in the main cavity of the ear (the concha). This has the effect of boosting the mid-range gain of the reproduced sound, and the reproduced sound appears to lack both low-frequency and high frequency content. It is known to use equalisation filters to shape, or equalise, the spectral response of the audio signals generated by such artificial head recording means, to compensate for this "twice- through-the-ears" effect. International Patent Application WO- A- 9515069 describes a binaural sound system which compensates for this so called "twice-through-the-ears" effect.
A further problem with artificial - head microphone systems is that when listening to the reproduced sound through loudspeakers interaural cross talk occurs, when an audio signal intended for one ear of a listener is also received by the other ear. In order to compensate for this effect it is well known to employ cross-talk cancellation circuits. See for example International Patent Application WO-A-9515069.
A further object of the present invention is to provide apparatus which enables binaural processing of the audio signals of a telephone conference system.
According to one aspect of the present invention there is provided apparatus for communicating three dimensional sounds via a telephone link comprising an input device consisting of two spaced microphones operable to produce left and right channel monophonic microphone output signals, signal processing means for each channel comprising filter means for receiving the microphone output signals and modifying the signals to compensate for head related air-to-ear transfer functions and equalise the spectral response of the microphone output signals, cross-talk cancellation means for cancelling out interaural cross-talk between the channels, and data compression means operable to receive an output signal from each channel, combine them to produce a binaural signal and compress said binaural signal to produce a compressed binaural signal for transmission over the telephone link, said compression means using a first compression algorithm to compress frequencies below 1 kHz whilst preserving relative phase differences between the channel output signals, a second algorithm to compress frequencies above 2 kHz whilst preserving relative differences between amplitudes of the channel output signals and a third algorithm to compress frequencies between 1 kHz and 2 kHz whilst preserving the IAD and ITD information over the whole frequency band.
Preferably the apparatus further includes a receiving means for receiving a compressed binaural signal transmitted over a telephone link and converting said compressed signal into left and right channel audio output signals, and spaced left and right channel sound reproduction means each of which is operable to receive a respective channel audio output signal from said receiving means and reproduced sound corresponding to said respective channel audio output signal.
The sound reproduction means may comprise a pair of loudspeakers, or a pair of headphones.
The apparatus may be provided with a video signal means comprising a camera operable to produce a video output signal, and the compression means is operable to receive the video output signal and to combine said video output signal with said compressed binaural signal to produce a combined output signal for transmission via the telephone link.
Preferably receiving means further includes means for receiving a video signal transmitted over a telephone link and converting the video signal into a video output signal, and display means operable to receive said video output signal and display a visual image.
The present invention will now be described by way of example, with reference to the accompanying drawings in which:-
Figure 1 illustrates schematically apparatus incorporating the present invention for telephone conference connection between two conference centres.
Figure 2 shows in block diagram form apparatus incorporating signal processors in accordance with the present invention. Figure 3 shows schematically human head, and
Figure 4 shows a further embodiment of the present invention.
Referring to Figure 1 each conference station 10, 11 is provided with a personal computer (PC) 12 which includes a monitor, two spaced microphones 13, 14 mounted in silicone rubber moulded ears 15 (which model precisely human outer ears) and two spaced loudspeakers 16.
The microphones 13, 14 should ideally be placed about 15 cm apart (the approximate width of a human head) and although it is preferable that the microphones are mounted in moulded ears 15 on an artificial head, the microphones could be mounted in moulded ears mounted on structure 17 (such as a block or sheet of wood). Alternatively the microphones 13, 14 and moulded ears could simply be mounted on the sides of the computer case 12, but this would give less precise detail to the three-dimensional sound field.
Each of the stations 10, 11 is connected to the other by means of the public telephone system 27 in the usual way.
Referring to Figure 2, both microphones 13, 14 are positioned to receive sound generated at their respective station 10, 11, where they are located. Each microphone converts the pressure variations associated with the sound waves that it receives into an analogue electrical signal at inputs 18a, 18b of each channel (representing left and right ears 13,14) of a digital signal processor 19.
The processor 18 comprises a HRTF filter 20 and an equalisation filter 21 for each channel. It will be understood that for the purposes of this specification, that "HRTF" or "Head Related Transfer Function" is intended to mean a function representing the transfer function of a path between a source of sound and the ear of the listener, either the ear nearer the sound (near HRTF) or the ear further from the sound (far HRTF). HRTF's may be obtained by measurements on a real human head equipped with suitable microphones; alternatively, they may be obtained using an artificial head means, which may be, as is common, a precise model of a human head or torso with microphones in the ear structures; alternatively it may be something far less precise, for example a block or sheet of wood positioned between a pair of spaced apart microphones; it might even be an electrical synthesis circuit or system which creates such functions.
Filters 21 correct the spectral response to compensate for the mid-range gain associated with the concha-related resonance, as explained in International Patent Applications WO-A- 9422278 and WO-A-9515069. The outputs 21a, 21b of the filters 21 are fed to cross-talk cancellation circuits 22 which cancel out the interaural crosstalk as explained in International Patent Applications WO-A-9422278 and WO-A-9515069. The output signals at each channel output 23 comprises a monophonic digital audio signal.
The normal signals transmitted over internationally acceptable telephone networks are typically a monophonic signal covering a range of frequencies from about 200 Hz to 3.4 kHz. In order to be able to transmit the outputs 23 of each channel over a normal telecommunications line, and reproduce a realistic three dimensional sound field, it is necessary to combine the output signals 23 to produce a stereophonic signal covering a wider range of frequencies (typically lower than 1 kHz and higher than 13 kHz) whilst still being able to differentiate between the left and right channel signals To do this, the output signals 23 of each channel are combined and compressed by a signal compression means 25 to produce a stereophonic output signal 24. The compression algorithms used by the compression means 25 are designed to preserve the three dimensional cues in the audio output signals 23 from each channel. One of the aspects of this is to preserve a wider range of frequencies than is normal for telephony compression. A second key aspect is to preserve the time relationship between the signals in the two channels. The manner in which the head and outer ears of a listener modify soundwaves before they are registered by the inner ears is complex, with several contributing factors playing a part. When a sound source is directly in front of the listener, then each pinna (outer ear flap), together with its auditory canal, is exposed equally to the sound source. However when the sound source is moved to one side of the head of the listener, then the more distant ear lies in the shadow of the head, and the ear closer to the sound source is aligned more on-axis with the source. When sound waves encounter the listeners head, the soundwaves.diffract around the listener's head. In general, the average width of a human head is 15 cm with an interaural path length of about 20 cm when the circumference effect is taken into account. Sound waves of greater wavelength than 15 cm (corresponding to frequencies below about 1.7 kHz) can diffract efficiently around a human head whereas at higher frequencies the sound wave cannot diffract efficiently around the head. This effect, known as "head-shadowing", creates differences in amplitudes of the sound signals arriving at each ear of the listener. This interaural amplitude difference (IAD) is one of the primary 3D cues which need to be preserved. The effects of diffraction on the intensity of the sound are noticeable in the range of between 700 Hz and 8 kHz and are more noticeable at higher frequencies (say above 2 kHz), where the head-shadowing creates noticeable differences in the intensities of the sound waves reaching the ears. The listener's brain uses these differences in intensity as cues to locate the direction of the source of high frequency sounds. Therefore it is important to retain the relationship between the intensities (or amplitudes) of the high frequency sounds.
At lower frequencies (below say 1 kHz) there is little or no difference in the intensity of the acoustic energy of the sound waves received at both ears but there is a marked phase difference. In general terms the phase difference is approximately proportional to frequency. The listener's brain therefore uses the phase differences of the low frequencies as an important cue to determine the direction of the source of low frequency sounds. It is therefore important to retain the phase relationships between the output signals of the left and right channels for the low frequency sounds. In addition to the IAD there will be time-of-arrival differences between the left and right ears of the listener, unless the sound source is exactly in front, behind, above or below the head of the listener. This is known as the interaural time delay (ITD) and can be seen depicted in diagram form in Figure 3 which shows a plan view of a conceptual head with a left ear (LE) and a right ear (RE) receiving a sound signal from a distant source at azimuth angle θ (about +45° as shown in the drawing). When the wave front (W - W1) arrives at the right ear (RE), then it can be seen that there is a path length of (a+b) still to travel before it reaches the left ear LE. By symmetry, the path length b is equal to the distance from head centre to wave front (W - W'),and hence b = r.sin θ. The path distance a, represents a proportion of the circumference subtended by θ. By inspection, the path length (a+b) is given by.
Figure imgf000010_0001
When θ tends to zero so does the path length (a + b); when θ tends to 90° and the head is 15 cm in width, then the path length is approximately 19.3 cm and the associated ITD is about 760μs. In practice, ITDs are measured to be slightly greater than this, possibly because of the non-spherical nature of human heads, the complex diffractive situation and surface effects. Hence ITDs lying in the range of 0 to 0.8 ms are also important primary 3D cues.
As explained above, the mid-range gain due to the concha related resonance and the resonance in the auditory canal of the outer ear occurs at about 3 kHz or slightly higher and this is at the extreme end of the normal bandwidth of conventional telephone transmission lines. Furthermore it is believed that the Fossa (a cavity at the uppermost region of the Pinna of the outer ear) creates resonance at 13 kHz which boosts the higher frequency sounds, and that the brain of the listener makes use of the higher frequency sounds at 13 kHz or above to assist in determining whether the source of sound is in front of or behind the listener. It is therefore important to retain the detail of high frequency sounds above 13 kHz, if front and back cues are necessary. Bearing the above in mind, the compression means 25 uses a first algorithm which allows compression of frequencies below 1 kHz, whilst preserving phase differences between the channel output signal, and uses a second algorithm to compress frequencies above 2 kHz, whilst preserving relative differences in the amplitudes of the channel output signals 23.
The compression means 25 also employs algorithms that allow the compression of the mid range frequencies, whilst preserving the IAD and ITD information over the whole frequency band.
The compression means 25 thus preserves the phase and amplitude relationships up to 8 kHz for reproducing three dimensional. sound fields without front and back cues, or up to 13 kHz, or above, when front and back cues are wanted.
The output signal 24 of the compression means 25 is a compressed binaural signal which is transmitted over a conventional public telephone link 27 to another receiving station 10, 11.
Each station 10, 11 further includes a receiving means 28 for receiving an incoming compressed combined binaural signal transmitted via the telephone link 27. The receiving means 28, (see Figure 2), comprises a signal processor which operates to re-expand the incoming compressed signal 26 and produce two channel input signals 30. Each channel input signal 30 is supplied to a sound reproduction device 16 which may be the pair of loudspeakers 16 or a pair of headphones 32.
In the case where it is desired to listen through headphones 31(b), it is preferred not to cancel the interaural cross-talk. It is therefore possible to re-introduce the cancelled cross-talk by combining a signal which is the inverse of the cross-talk cancellation signal with the incoming signal. In a further embodiment of the invention the apparatus of Figures 1, 2, and 3 further includes means for transmitting and receiving video signals over a telephone link 27 as shown in Figure 4 . For simplicity, in Figure 4 the same reference numbers are given to the same components that are common to the Figure 2 embodiment.
Referring to Figures 1 and 4, each station 10, 11 is provided with a video camera 32 and video processor 33 which is operable to produce a video output signal 34. The video output signal 34 from the camera 32 is supplied to the compression means 25 of the signal processor 19 (see Figure 4). The compression means 25 includes circuits for combining the binaural output signal 24 with the video output signal 34 to produce a combined video and binaural output signal 36 for transmission over the a telephone link.
The apparatus is also provided with a receiving means 37 for receiving an incoming combined video and binaural signal 38 transmitted over the telephone link 27 from another remote conference centre 10 or 11. The receiving means 37 includes a decompression means 39 for expanding the received video and binaural signal 38, and operates to produce a video signal 40 to a video processor 41 and two audio output signals 30.to the speakers 16 or headphones 32. The output of the video processor 40 drives the monitor 12 to produce a visual image.

Claims

1. Apparatus for transmitting three dimensional sounds via a telephone link comprising an input device consisting of two spaced microphones operable to produce left and right channel monophonic microphone output signals, signal processing means for each channel comprising filter means for receiving the microphone output signals and modifying the signals to compensate for head related air-to-ear transfer functions and equalise the spectral response of the microphone output signals, cross-talk cancellation means for cancelling out interaural cross-talk between the channels, and data compression means operable to receive an output signal from each channel, combine them to produce a binaural signal and compress said binaural signal to produce a compressed binaural signal for transmission over the telephone link, said compression means using a first compression algorithm to compress frequencies below 1 kHz whilst preserving relative phase differences between the channel output signals, a second algorithm to compress frequencies above 2 kHz whilst preserving relative differences between amplitudes of the channel output signals and a third algorithm to compress frequencies between 1 kHz and 2 kHz whilst preserving IAD and ITD information over the whole frequency band.
2. Apparatus according to claim 1 further including receiving means for receiving a compressed binaural signal transmitted over a telephone link and converting said compressed signal into left and right channel audio output signals, and spaced left and right channel sound reproduction means each of which is operable to receive a respective channel audio output signal from said receiving means and reproduced sound corresponding to said respective channel audio output signal.
3. Apparatus according to claim 2 wherein said sound reproduction means comprises a pair of loudspeakers.
4. Apparatus according to claim 2 wherein said sound reproduction means comprises a pair of headphones.
5. Apparatus according to any one of claims 1 to 4 wherein video signal means are provided comprising a camera operable to produce a video output signal, and the compression means is operable to receive the video output signal and to combine said video output signal with said compressed binaural signal to produce a combined output signal for transmission via the telephone link.
6. Apparatus according to any one of the preceding claims wherein the receiving means further includes means for receiving a video signal transmitted over a telephone link and converting the video signal into a video output signal, and display means operable to receive said video output signal and display a visual image.
7. A method of transmitting three dimensional sounds via a telephone link comprising the steps of providing left and right channel monophonic microphone output signals to a signal processing means, filtering said microphone output signals and modifying the signals to compensate for head related air-to-ear transfer functions, equalising the spectral response of the microphone output signals, performing cross-talk cancellation of interaural cross-talk between the channels, and data compressing processed output signals, so as to produce a binaural signal and compressing said binaural signal to produce a compressed binaural signal for transmission over the telephone link, wherein said compression means employs three algorithms, a first compression algorithm compresses frequencies below 1 kHz whilst preserving relative phase differences between the channel output signals, a second algorithm compresses frequencies above 2 kHz whilst preserving relative differences between amplitudes of the channel output signals and a third algorithm compresses frequencies between 1 kHz and 2 kHz whilst preserving IAD and LTD information over the whole frequency band.
8. Apparatus and method substantially as herein described with reference to the accompanying drawings.
PCT/GB1998/000813 1997-03-18 1998-03-18 Telephonic transmission of three-dimensional sound WO1998042161A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP98909666A EP0968624A2 (en) 1997-03-18 1998-03-18 Telephonic transmission of three dimensional sound

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB9705565.1 1997-03-18
GBGB9705565.1A GB9705565D0 (en) 1997-03-18 1997-03-18 Telephone transmission of 3d sound
GB9707962.8 1997-04-19
GBGB9707962.8A GB9707962D0 (en) 1997-04-19 1997-04-19 Telephonic transmission of 3D sound

Publications (2)

Publication Number Publication Date
WO1998042161A2 true WO1998042161A2 (en) 1998-09-24
WO1998042161A3 WO1998042161A3 (en) 1998-12-17

Family

ID=26311214

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB1998/000813 WO1998042161A2 (en) 1997-03-18 1998-03-18 Telephonic transmission of three-dimensional sound

Country Status (2)

Country Link
EP (1) EP0968624A2 (en)
WO (1) WO1998042161A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008043349A2 (en) 2006-10-12 2008-04-17 Andreas Max Pavel Method and apparatus for recording, transmitting, and playing back sound events for communication applications
US9229086B2 (en) 2011-06-01 2016-01-05 Dolby Laboratories Licensing Corporation Sound source localization apparatus and method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994022278A1 (en) * 1993-03-18 1994-09-29 Central Research Laboratories Limited Plural-channel sound processing
WO1995015069A1 (en) * 1993-11-25 1995-06-01 Central Research Laboratories Limited Apparatus for processing binaural signals
US5434913A (en) * 1993-11-24 1995-07-18 Intel Corporation Audio subsystem for computer-based conferencing system
US5596644A (en) * 1994-10-27 1997-01-21 Aureal Semiconductor Inc. Method and apparatus for efficient presentation of high-quality three-dimensional audio

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1994022278A1 (en) * 1993-03-18 1994-09-29 Central Research Laboratories Limited Plural-channel sound processing
US5434913A (en) * 1993-11-24 1995-07-18 Intel Corporation Audio subsystem for computer-based conferencing system
WO1995015069A1 (en) * 1993-11-25 1995-06-01 Central Research Laboratories Limited Apparatus for processing binaural signals
US5596644A (en) * 1994-10-27 1997-01-21 Aureal Semiconductor Inc. Method and apparatus for efficient presentation of high-quality three-dimensional audio

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008043349A2 (en) 2006-10-12 2008-04-17 Andreas Max Pavel Method and apparatus for recording, transmitting, and playing back sound events for communication applications
DE102006048295A1 (en) * 2006-10-12 2008-04-17 Andreas Max Pavel Method and device for recording, transmission and reproduction of sound events for communication applications
DE102006048295B4 (en) * 2006-10-12 2008-06-12 Andreas Max Pavel Method and device for recording, transmission and reproduction of sound events for communication applications
WO2008043349A3 (en) * 2006-10-12 2008-09-04 Andreas Max Pavel Method and apparatus for recording, transmitting, and playing back sound events for communication applications
JP2010506519A (en) * 2006-10-12 2010-02-25 アンドレアス、マックス、パベル Processing and apparatus for obtaining, transmitting and playing sound events for the communications field
EA013670B1 (en) * 2006-10-12 2010-06-30 Андреас Макс Павел Method and apparatus for recording, transmitting and playing back sound events for communication applications
AP2298A (en) * 2006-10-12 2011-10-31 Andreas Max Pavel Method and apparatus for recording, transmitting, and playing back sound events for communication applications.
US9229086B2 (en) 2011-06-01 2016-01-05 Dolby Laboratories Licensing Corporation Sound source localization apparatus and method

Also Published As

Publication number Publication date
WO1998042161A3 (en) 1998-12-17
EP0968624A2 (en) 2000-01-05

Similar Documents

Publication Publication Date Title
AU2008362920B2 (en) Method of rendering binaural stereo in a hearing aid system and a hearing aid system
US7012630B2 (en) Spatial sound conference system and apparatus
US4118599A (en) Stereophonic sound reproduction system
JP4166435B2 (en) Teleconferencing system
US8340315B2 (en) Assembly, system and method for acoustic transducers
JP3435156B2 (en) Sound image localization device
US20160140947A1 (en) Apparatus, Method, and Computer Program for Adjustable Noise Cancellation
CN109640235B (en) Binaural hearing system with localization of sound sources
JP2008543144A (en) Acoustic signal apparatus, system, and method
WO2005125270A1 (en) In-ear monitoring system and method
US20070291967A1 (en) Spartial audio processing method, a program product, an electronic device and a system
JP7070910B2 (en) Video conference system
KR20090077934A (en) Method and apparatus for recording, transmitting, and playing back sound events for communication applications
WO1998042161A2 (en) Telephonic transmission of three-dimensional sound
JP6972858B2 (en) Sound processing equipment, programs and methods
KR102613033B1 (en) Earphone based on head related transfer function, phone device using the same and method for calling using the same
West et al. Teleconferencing system using head-related signals
JP2662825B2 (en) Conference call terminal
JPH02230898A (en) Voice reproduction system
JP2662824B2 (en) Conference call terminal
Horiuchi et al. Adaptive estimation of transfer functions for sound localization using stereo earphone-microphone combination
WO2017211448A1 (en) Method for generating a two-channel signal from a single-channel signal of a sound source
JPH07107599A (en) Headphone receiver
WO2005069680A1 (en) Sound receiving arrangement comprising sound receiving means and sound receiving method

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): CA JP KR US

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
AK Designated states

Kind code of ref document: A3

Designated state(s): CA JP KR US

AL Designated countries for regional patents

Kind code of ref document: A3

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase in:

Ref country code: JP

Ref document number: 1998540259

Format of ref document f/p: F

WWE Wipo information: entry into national phase

Ref document number: 1998909666

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1998909666

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 09381100

Country of ref document: US

NENP Non-entry into the national phase in:

Ref country code: CA

WWW Wipo information: withdrawn in national office

Ref document number: 1998909666

Country of ref document: EP