WO1998042161A2 - Telephonic transmission of three-dimensional sound - Google Patents
Telephonic transmission of three-dimensional sound Download PDFInfo
- Publication number
- WO1998042161A2 WO1998042161A2 PCT/GB1998/000813 GB9800813W WO9842161A2 WO 1998042161 A2 WO1998042161 A2 WO 1998042161A2 GB 9800813 W GB9800813 W GB 9800813W WO 9842161 A2 WO9842161 A2 WO 9842161A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- signals
- output signals
- channel
- khz
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S1/00—Two-channel systems
Definitions
- This invention relates to telephonic transmission of three dimensional (3D) sounds and more particularly to an apparatus for communicating three dimensional sounds between two or more remote locations by telephone transmission.
- the present invention is concerned with telephone conference systems irrespective of whether or not a visual image is transmitted at the same time as the audio transmission.
- Video telephone conference systems which employ large expensive equipment to enable a large group of people in one location to communicate with another group at another location is well known.
- Video telephones which incorporate a camera and video screen at each location are also well known.
- office technologies such as fax, telephone and video such systems are becoming more readily available.
- An object of the present invention is to overcome, or reduce, these undesirable effects by reproducing a three dimensional sound field of the transmitting station at the receiving location.
- Binaural technology is based on using a so-called "artificial head” microphone system to receive sound from a sound source and convert the acoustic energy into an electrical signal which is subsequently processed digitally.
- the use of an artificial head ensures that the natural three dimensional sound cues, which the brain of a listener uses to determine the position of sound sources in three dimensional space, are incorporated into the audio signal.
- the artificial head is preferably constructed to resemble as close as possible an actual human head and upper torso and has silicone rubber ears which precisely resemble human ears but in some applications good results (but less precise) can be achieved using two spaced microphones with a block or sheet of wood between the microphones.
- binaural signals is intended to mean two channel or stereophonic signals which include one or more components representing audio diffraction effects created by an artificial head means positioned between a pair of microphones.
- artificial head is intended to cover not only a precise model of a human head but other imprecise models (such as for example a block of wood between microphones) and electrical synthesis of the audio diffraction signals.
- a further problem with artificial - head microphone systems is that when listening to the reproduced sound through loudspeakers interaural cross talk occurs, when an audio signal intended for one ear of a listener is also received by the other ear. In order to compensate for this effect it is well known to employ cross-talk cancellation circuits. See for example International Patent Application WO-A-9515069.
- a further object of the present invention is to provide apparatus which enables binaural processing of the audio signals of a telephone conference system.
- apparatus for communicating three dimensional sounds via a telephone link comprising an input device consisting of two spaced microphones operable to produce left and right channel monophonic microphone output signals, signal processing means for each channel comprising filter means for receiving the microphone output signals and modifying the signals to compensate for head related air-to-ear transfer functions and equalise the spectral response of the microphone output signals, cross-talk cancellation means for cancelling out interaural cross-talk between the channels, and data compression means operable to receive an output signal from each channel, combine them to produce a binaural signal and compress said binaural signal to produce a compressed binaural signal for transmission over the telephone link, said compression means using a first compression algorithm to compress frequencies below 1 kHz whilst preserving relative phase differences between the channel output signals, a second algorithm to compress frequencies above 2 kHz whilst preserving relative differences between amplitudes of the channel output signals and a third algorithm to compress frequencies between 1 kHz and 2 kHz whilst preserving the IAD and ITD
- the apparatus further includes a receiving means for receiving a compressed binaural signal transmitted over a telephone link and converting said compressed signal into left and right channel audio output signals, and spaced left and right channel sound reproduction means each of which is operable to receive a respective channel audio output signal from said receiving means and reproduced sound corresponding to said respective channel audio output signal.
- a receiving means for receiving a compressed binaural signal transmitted over a telephone link and converting said compressed signal into left and right channel audio output signals
- spaced left and right channel sound reproduction means each of which is operable to receive a respective channel audio output signal from said receiving means and reproduced sound corresponding to said respective channel audio output signal.
- the sound reproduction means may comprise a pair of loudspeakers, or a pair of headphones.
- the apparatus may be provided with a video signal means comprising a camera operable to produce a video output signal, and the compression means is operable to receive the video output signal and to combine said video output signal with said compressed binaural signal to produce a combined output signal for transmission via the telephone link.
- a video signal means comprising a camera operable to produce a video output signal
- the compression means is operable to receive the video output signal and to combine said video output signal with said compressed binaural signal to produce a combined output signal for transmission via the telephone link.
- receiving means further includes means for receiving a video signal transmitted over a telephone link and converting the video signal into a video output signal, and display means operable to receive said video output signal and display a visual image.
- Figure 1 illustrates schematically apparatus incorporating the present invention for telephone conference connection between two conference centres.
- Figure 2 shows in block diagram form apparatus incorporating signal processors in accordance with the present invention.
- Figure 3 shows schematically human head, and
- Figure 4 shows a further embodiment of the present invention.
- each conference station 10, 11 is provided with a personal computer (PC) 12 which includes a monitor, two spaced microphones 13, 14 mounted in silicone rubber moulded ears 15 (which model precisely human outer ears) and two spaced loudspeakers 16.
- PC personal computer
- the microphones 13, 14 should ideally be placed about 15 cm apart (the approximate width of a human head) and although it is preferable that the microphones are mounted in moulded ears 15 on an artificial head, the microphones could be mounted in moulded ears mounted on structure 17 (such as a block or sheet of wood). Alternatively the microphones 13, 14 and moulded ears could simply be mounted on the sides of the computer case 12, but this would give less precise detail to the three-dimensional sound field.
- Each of the stations 10, 11 is connected to the other by means of the public telephone system 27 in the usual way.
- both microphones 13, 14 are positioned to receive sound generated at their respective station 10, 11, where they are located.
- Each microphone converts the pressure variations associated with the sound waves that it receives into an analogue electrical signal at inputs 18a, 18b of each channel (representing left and right ears 13,14) of a digital signal processor 19.
- the processor 18 comprises a HRTF filter 20 and an equalisation filter 21 for each channel.
- HRTF or "Head Related Transfer Function” is intended to mean a function representing the transfer function of a path between a source of sound and the ear of the listener, either the ear nearer the sound (near HRTF) or the ear further from the sound (far HRTF).
- HRTF's may be obtained by measurements on a real human head equipped with suitable microphones; alternatively, they may be obtained using an artificial head means, which may be, as is common, a precise model of a human head or torso with microphones in the ear structures; alternatively it may be something far less precise, for example a block or sheet of wood positioned between a pair of spaced apart microphones; it might even be an electrical synthesis circuit or system which creates such functions.
- an artificial head means which may be, as is common, a precise model of a human head or torso with microphones in the ear structures; alternatively it may be something far less precise, for example a block or sheet of wood positioned between a pair of spaced apart microphones; it might even be an electrical synthesis circuit or system which creates such functions.
- Filters 21 correct the spectral response to compensate for the mid-range gain associated with the concha-related resonance, as explained in International Patent Applications WO-A- 9422278 and WO-A-9515069.
- the outputs 21a, 21b of the filters 21 are fed to cross-talk cancellation circuits 22 which cancel out the interaural crosstalk as explained in International Patent Applications WO-A-9422278 and WO-A-9515069.
- the output signals at each channel output 23 comprises a monophonic digital audio signal.
- the normal signals transmitted over internationally acceptable telephone networks are typically a monophonic signal covering a range of frequencies from about 200 Hz to 3.4 kHz.
- the output signals 23 of each channel are combined and compressed by a signal compression means 25 to produce a stereophonic output signal 24.
- the compression algorithms used by the compression means 25 are designed to preserve the three dimensional cues in the audio output signals 23 from each channel.
- a second key aspect is to preserve the time relationship between the signals in the two channels.
- the manner in which the head and outer ears of a listener modify soundwaves before they are registered by the inner ears is complex, with several contributing factors playing a part.
- each pinna outer ear flap
- each pinna together with its auditory canal
- the sound source is moved to one side of the head of the listener, then the more distant ear lies in the shadow of the head, and the ear closer to the sound source is aligned more on-axis with the source.
- the soundwaves.diffract around the listener's head When sound waves encounter the listeners head, the soundwaves.diffract around the listener's head. In general, the average width of a human head is 15 cm with an interaural path length of about 20 cm when the circumference effect is taken into account. Sound waves of greater wavelength than 15 cm (corresponding to frequencies below about 1.7 kHz) can diffract efficiently around a human head whereas at higher frequencies the sound wave cannot diffract efficiently around the head. This effect, known as "head-shadowing", creates differences in amplitudes of the sound signals arriving at each ear of the listener. This interaural amplitude difference (IAD) is one of the primary 3D cues which need to be preserved.
- IAD interaural amplitude difference
- the effects of diffraction on the intensity of the sound are noticeable in the range of between 700 Hz and 8 kHz and are more noticeable at higher frequencies (say above 2 kHz), where the head-shadowing creates noticeable differences in the intensities of the sound waves reaching the ears.
- the listener's brain uses these differences in intensity as cues to locate the direction of the source of high frequency sounds. Therefore it is important to retain the relationship between the intensities (or amplitudes) of the high frequency sounds.
- phase difference is approximately proportional to frequency.
- the listener's brain therefore uses the phase differences of the low frequencies as an important cue to determine the direction of the source of low frequency sounds. It is therefore important to retain the phase relationships between the output signals of the left and right channels for the low frequency sounds.
- time-of-arrival differences between the left and right ears of the listener, unless the sound source is exactly in front, behind, above or below the head of the listener.
- ITD interaural time delay
- Figure 3 shows a plan view of a conceptual head with a left ear (LE) and a right ear (RE) receiving a sound signal from a distant source at azimuth angle ⁇ (about +45° as shown in the drawing).
- the wave front (W - W 1 ) arrives at the right ear (RE)
- the path distance a represents a proportion of the circumference subtended by ⁇ .
- the path length (a+b) is given by.
- ITDs are measured to be slightly greater than this, possibly because of the non-spherical nature of human heads, the complex diffractive situation and surface effects. Hence ITDs lying in the range of 0 to 0.8 ms are also important primary 3D cues.
- the mid-range gain due to the concha related resonance and the resonance in the auditory canal of the outer ear occurs at about 3 kHz or slightly higher and this is at the extreme end of the normal bandwidth of conventional telephone transmission lines.
- the Fossa a cavity at the uppermost region of the Pinna of the outer ear
- the brain of the listener makes use of the higher frequency sounds at 13 kHz or above to assist in determining whether the source of sound is in front of or behind the listener. It is therefore important to retain the detail of high frequency sounds above 13 kHz, if front and back cues are necessary.
- the compression means 25 uses a first algorithm which allows compression of frequencies below 1 kHz, whilst preserving phase differences between the channel output signal, and uses a second algorithm to compress frequencies above 2 kHz, whilst preserving relative differences in the amplitudes of the channel output signals 23.
- the compression means 25 also employs algorithms that allow the compression of the mid range frequencies, whilst preserving the IAD and ITD information over the whole frequency band.
- the compression means 25 thus preserves the phase and amplitude relationships up to 8 kHz for reproducing three dimensional. sound fields without front and back cues, or up to 13 kHz, or above, when front and back cues are wanted.
- the output signal 24 of the compression means 25 is a compressed binaural signal which is transmitted over a conventional public telephone link 27 to another receiving station 10, 11.
- Each station 10, 11 further includes a receiving means 28 for receiving an incoming compressed combined binaural signal transmitted via the telephone link 27.
- the receiving means 28, (see Figure 2), comprises a signal processor which operates to re-expand the incoming compressed signal 26 and produce two channel input signals 30.
- Each channel input signal 30 is supplied to a sound reproduction device 16 which may be the pair of loudspeakers 16 or a pair of headphones 32.
- the apparatus of Figures 1, 2, and 3 further includes means for transmitting and receiving video signals over a telephone link 27 as shown in Figure 4 .
- Figure 4 the same reference numbers are given to the same components that are common to the Figure 2 embodiment.
- each station 10, 11 is provided with a video camera 32 and video processor 33 which is operable to produce a video output signal 34.
- the video output signal 34 from the camera 32 is supplied to the compression means 25 of the signal processor 19 (see Figure 4).
- the compression means 25 includes circuits for combining the binaural output signal 24 with the video output signal 34 to produce a combined video and binaural output signal 36 for transmission over the a telephone link.
- the apparatus is also provided with a receiving means 37 for receiving an incoming combined video and binaural signal 38 transmitted over the telephone link 27 from another remote conference centre 10 or 11.
- the receiving means 37 includes a decompression means 39 for expanding the received video and binaural signal 38, and operates to produce a video signal 40 to a video processor 41 and two audio output signals 30.to the speakers 16 or headphones 32.
- the output of the video processor 40 drives the monitor 12 to produce a visual image.
Landscapes
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Stereophonic System (AREA)
Abstract
Description
Claims
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP98909666A EP0968624A2 (en) | 1997-03-18 | 1998-03-18 | Telephonic transmission of three dimensional sound |
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB9705565.1 | 1997-03-18 | ||
GBGB9705565.1A GB9705565D0 (en) | 1997-03-18 | 1997-03-18 | Telephone transmission of 3d sound |
GB9707962.8 | 1997-04-19 | ||
GBGB9707962.8A GB9707962D0 (en) | 1997-04-19 | 1997-04-19 | Telephonic transmission of 3D sound |
Publications (2)
Publication Number | Publication Date |
---|---|
WO1998042161A2 true WO1998042161A2 (en) | 1998-09-24 |
WO1998042161A3 WO1998042161A3 (en) | 1998-12-17 |
Family
ID=26311214
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/GB1998/000813 WO1998042161A2 (en) | 1997-03-18 | 1998-03-18 | Telephonic transmission of three-dimensional sound |
Country Status (2)
Country | Link |
---|---|
EP (1) | EP0968624A2 (en) |
WO (1) | WO1998042161A2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008043349A2 (en) | 2006-10-12 | 2008-04-17 | Andreas Max Pavel | Method and apparatus for recording, transmitting, and playing back sound events for communication applications |
US9229086B2 (en) | 2011-06-01 | 2016-01-05 | Dolby Laboratories Licensing Corporation | Sound source localization apparatus and method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1994022278A1 (en) * | 1993-03-18 | 1994-09-29 | Central Research Laboratories Limited | Plural-channel sound processing |
WO1995015069A1 (en) * | 1993-11-25 | 1995-06-01 | Central Research Laboratories Limited | Apparatus for processing binaural signals |
US5434913A (en) * | 1993-11-24 | 1995-07-18 | Intel Corporation | Audio subsystem for computer-based conferencing system |
US5596644A (en) * | 1994-10-27 | 1997-01-21 | Aureal Semiconductor Inc. | Method and apparatus for efficient presentation of high-quality three-dimensional audio |
-
1998
- 1998-03-18 WO PCT/GB1998/000813 patent/WO1998042161A2/en not_active Application Discontinuation
- 1998-03-18 EP EP98909666A patent/EP0968624A2/en not_active Withdrawn
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO1994022278A1 (en) * | 1993-03-18 | 1994-09-29 | Central Research Laboratories Limited | Plural-channel sound processing |
US5434913A (en) * | 1993-11-24 | 1995-07-18 | Intel Corporation | Audio subsystem for computer-based conferencing system |
WO1995015069A1 (en) * | 1993-11-25 | 1995-06-01 | Central Research Laboratories Limited | Apparatus for processing binaural signals |
US5596644A (en) * | 1994-10-27 | 1997-01-21 | Aureal Semiconductor Inc. | Method and apparatus for efficient presentation of high-quality three-dimensional audio |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008043349A2 (en) | 2006-10-12 | 2008-04-17 | Andreas Max Pavel | Method and apparatus for recording, transmitting, and playing back sound events for communication applications |
DE102006048295A1 (en) * | 2006-10-12 | 2008-04-17 | Andreas Max Pavel | Method and device for recording, transmission and reproduction of sound events for communication applications |
DE102006048295B4 (en) * | 2006-10-12 | 2008-06-12 | Andreas Max Pavel | Method and device for recording, transmission and reproduction of sound events for communication applications |
WO2008043349A3 (en) * | 2006-10-12 | 2008-09-04 | Andreas Max Pavel | Method and apparatus for recording, transmitting, and playing back sound events for communication applications |
JP2010506519A (en) * | 2006-10-12 | 2010-02-25 | アンドレアス、マックス、パベル | Processing and apparatus for obtaining, transmitting and playing sound events for the communications field |
EA013670B1 (en) * | 2006-10-12 | 2010-06-30 | Андреас Макс Павел | Method and apparatus for recording, transmitting and playing back sound events for communication applications |
AP2298A (en) * | 2006-10-12 | 2011-10-31 | Andreas Max Pavel | Method and apparatus for recording, transmitting, and playing back sound events for communication applications. |
US9229086B2 (en) | 2011-06-01 | 2016-01-05 | Dolby Laboratories Licensing Corporation | Sound source localization apparatus and method |
Also Published As
Publication number | Publication date |
---|---|
WO1998042161A3 (en) | 1998-12-17 |
EP0968624A2 (en) | 2000-01-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2008362920B2 (en) | Method of rendering binaural stereo in a hearing aid system and a hearing aid system | |
US7012630B2 (en) | Spatial sound conference system and apparatus | |
US4118599A (en) | Stereophonic sound reproduction system | |
JP4166435B2 (en) | Teleconferencing system | |
US8340315B2 (en) | Assembly, system and method for acoustic transducers | |
JP3435156B2 (en) | Sound image localization device | |
US20160140947A1 (en) | Apparatus, Method, and Computer Program for Adjustable Noise Cancellation | |
CN109640235B (en) | Binaural hearing system with localization of sound sources | |
JP2008543144A (en) | Acoustic signal apparatus, system, and method | |
WO2005125270A1 (en) | In-ear monitoring system and method | |
US20070291967A1 (en) | Spartial audio processing method, a program product, an electronic device and a system | |
JP7070910B2 (en) | Video conference system | |
KR20090077934A (en) | Method and apparatus for recording, transmitting, and playing back sound events for communication applications | |
WO1998042161A2 (en) | Telephonic transmission of three-dimensional sound | |
JP6972858B2 (en) | Sound processing equipment, programs and methods | |
KR102613033B1 (en) | Earphone based on head related transfer function, phone device using the same and method for calling using the same | |
West et al. | Teleconferencing system using head-related signals | |
JP2662825B2 (en) | Conference call terminal | |
JPH02230898A (en) | Voice reproduction system | |
JP2662824B2 (en) | Conference call terminal | |
Horiuchi et al. | Adaptive estimation of transfer functions for sound localization using stereo earphone-microphone combination | |
WO2017211448A1 (en) | Method for generating a two-channel signal from a single-channel signal of a sound source | |
JPH07107599A (en) | Headphone receiver | |
WO2005069680A1 (en) | Sound receiving arrangement comprising sound receiving means and sound receiving method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): CA JP KR US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): CA JP KR US |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
NENP | Non-entry into the national phase in: |
Ref country code: JP Ref document number: 1998540259 Format of ref document f/p: F |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1998909666 Country of ref document: EP |
|
WWP | Wipo information: published in national office |
Ref document number: 1998909666 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 09381100 Country of ref document: US |
|
NENP | Non-entry into the national phase in: |
Ref country code: CA |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1998909666 Country of ref document: EP |