Bub et al., 1995 - Google Patents
Knowing who to listen to in speech recognition: Visually guided beamformingBub et al., 1995
View PDF- Document ID
- 2428160260266937212
- Author
- Bub U
- Hunke M
- Waibel A
- Publication year
- Publication venue
- 1995 International Conference on Acoustics, Speech, and Signal Processing
External Links
Snippet
With speech recognition systems steadily improving in performance, freedom from head-sets and push-buttons to activate the recognizer is one of the most important issues to achieve user acceptance. Microphone arrays and beamforming can deliver signals that suppress …
- 230000004807 localization 0 abstract description 15
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/42—Systems providing special services or facilities to subscribers
- H04M3/56—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
- H04M3/568—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants
- H04M3/569—Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants using the instant speaker's algorithm
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10K—SOUND-PRODUCING DEVICES; ACOUSTICS NOT OTHERWISE PROVIDED FOR
- G10K11/00—Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
- G10K11/18—Methods or devices for transmitting, conducting, or directing sound
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bub et al. | Knowing who to listen to in speech recognition: Visually guided beamforming | |
Donley et al. | Easycom: An augmented reality dataset to support algorithms for easy communication in noisy environments | |
DiBiase | A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays | |
JP6464449B2 (en) | Sound source separation apparatus and sound source separation method | |
KR101444100B1 (en) | Noise cancelling method and apparatus from the mixed sound | |
JP2022544138A (en) | Systems and methods for assisting selective listening | |
US20200184991A1 (en) | Sound class identification using a neural network | |
EP2320676A1 (en) | Method, communication device and communication system for controlling sound focusing | |
Taherian et al. | Multi-channel talker-independent speaker separation through location-based training | |
Khan et al. | Video-aided model-based source separation in real reverberant rooms | |
CN111078185A (en) | Method and equipment for recording sound | |
JP2022062875A (en) | Audio signal processing method and audio signal processing apparatus | |
Tesch et al. | Spatially selective deep non-linear filters for speaker extraction | |
Pertilä | Online blind speech separation using multiple acoustic speaker tracking and time–frequency masking | |
KR101976937B1 (en) | Apparatus for automatic conference notetaking using mems microphone array | |
Rabinkin | Optimum sensor placement for microphone arrays | |
Nakadai et al. | Exploiting auditory fovea in humanoid-human interaction | |
Ihara et al. | Multichannel speech separation and localization by frequency assignment | |
KR102412148B1 (en) | Beamforming method and beamforming system using neural network | |
JP2022062876A (en) | Audio signal processing method and audio signal processing apparatus | |
Đurković | Localization, tracking, and separation of sound sources for cognitive robots | |
Flanagan et al. | Sound capture with three-dimensional selectivity | |
Wilson et al. | Audiovisual arrays for untethered spoken interfaces | |
CN113785357A (en) | Open active noise cancellation system | |
Brückmann et al. | Integration of a sound source detection into a probabilistic-based multimodal approach for person detection and tracking |