Nakagawa et al., 1999 - Google Patents

Using vision to improve sound source separation

Nakagawa et al., 1999

Document ID: 4167630125237943999
Author: Nakagawa Y; Okuno H; Kitano H; et al.
Publication year: 1999
Publication venue: PROCEEDINGS OF THE NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE

External Links

Cited by

Snippet

We present a method of improving sound source separation using vision. The sound source separation is an essential function to accomplish auditory scene understanding by separating stream of sounds generated from multiple sound sources. By separating a stream …

Continue reading at cdn.aaai.org (PDF) (other versions)

238000000926 separation method 0 title abstract description 40

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems

Similar Documents

Publication	Publication Date	Title
Nakadai et al.	2000	Active audition for humanoid
US6967455B2 (en)	2005-11-22	Robot audiovisual system
Nakagawa et al.	1999	Using vision to improve sound source separation
Lang et al.	2003	Providing the basis for human-robot-interaction: A multi-modal attention system for a mobile robot
Nakadai et al.	2002	Real-time sound source localization and separation for robot audition.
EP1818909B1 (en)	2011-11-02	Voice recognition system
CN111833899B (en)	2022-07-26	Voice detection method based on polyphonic regions, related device and storage medium
Mizumoto et al.	2011	Design and implementation of selectable sound separation on the Texai telepresence system using HARK
US20090030552A1 (en)	2009-01-29	Robotics visual and auditory system
Maganti et al.	2007	Speech enhancement and recognition in meetings with an audio–visual sensor array
Nakadai et al.	2002	Real-time speaker localization and speech separation by audio-visual integration
Bub et al.	1995	Knowing who to listen to in speech recognition: Visually guided beamforming
Nakadai et al.	2001	Epipolar geometry based sound localization and extraction for humanoid audition
Khan et al.	2013	Video-aided model-based source separation in real reverberant rooms
JP3632099B2 (en)	2005-03-23	Robot audio-visual system
Ban et al.	2017	Exploiting the complementarity of audio and visual data in multi-speaker tracking
Nakadai et al.	2006	Real-time tracking of multiple sound sources by integration of in-room and robot-embedded microphone arrays
Tesch et al.	2023	Multi-channel speech separation using spatially selective deep non-linear filters
Brandstein et al.	1996	Microphone‐array localization error estimation with application to sensor placement
Okuno et al.	2004	Computational auditory scene analysis and its application to robot audition
Nakadai et al.	2002	Exploiting auditory fovea in humanoid-human interaction
Okuno et al.	2004	Sound and visual tracking for humanoid robot
JP3843743B2 (en)	2006-11-08	Robot audio-visual system
Okuno et al.	2011	Robot audition: Missing feature theory approach and active audition
Okuno et al.	1999	Incorporating visual information into sound source separation