Nakagawa et al., 1999 - Google Patents
Using vision to improve sound source separationNakagawa et al., 1999
View PDF- Document ID
- 4167630125237943999
- Author
- Nakagawa Y
- Okuno H
- Kitano H
- et al.
- Publication year
- Publication venue
- PROCEEDINGS OF THE NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE
External Links
Snippet
We present a method of improving sound source separation using vision. The sound source separation is an essential function to accomplish auditory scene understanding by separating stream of sounds generated from multiple sound sources. By separating a stream …
- 238000000926 separation method 0 title abstract description 40
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/24—Speech recognition using non-acoustical features
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/06—Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N7/00—Television systems
- H04N7/14—Systems for two-way working
- H04N7/15—Conference systems
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nakadai et al. | Active audition for humanoid | |
US6967455B2 (en) | Robot audiovisual system | |
Nakagawa et al. | Using vision to improve sound source separation | |
Lang et al. | Providing the basis for human-robot-interaction: A multi-modal attention system for a mobile robot | |
Nakadai et al. | Real-time sound source localization and separation for robot audition. | |
EP1818909B1 (en) | Voice recognition system | |
CN111833899B (en) | Voice detection method based on polyphonic regions, related device and storage medium | |
Mizumoto et al. | Design and implementation of selectable sound separation on the Texai telepresence system using HARK | |
US20090030552A1 (en) | Robotics visual and auditory system | |
Maganti et al. | Speech enhancement and recognition in meetings with an audio–visual sensor array | |
Nakadai et al. | Real-time speaker localization and speech separation by audio-visual integration | |
Bub et al. | Knowing who to listen to in speech recognition: Visually guided beamforming | |
Nakadai et al. | Epipolar geometry based sound localization and extraction for humanoid audition | |
Khan et al. | Video-aided model-based source separation in real reverberant rooms | |
JP3632099B2 (en) | Robot audio-visual system | |
Ban et al. | Exploiting the complementarity of audio and visual data in multi-speaker tracking | |
Nakadai et al. | Real-time tracking of multiple sound sources by integration of in-room and robot-embedded microphone arrays | |
Tesch et al. | Multi-channel speech separation using spatially selective deep non-linear filters | |
Brandstein et al. | Microphone‐array localization error estimation with application to sensor placement | |
Okuno et al. | Computational auditory scene analysis and its application to robot audition | |
Nakadai et al. | Exploiting auditory fovea in humanoid-human interaction | |
Okuno et al. | Sound and visual tracking for humanoid robot | |
JP3843743B2 (en) | Robot audio-visual system | |
Okuno et al. | Robot audition: Missing feature theory approach and active audition | |
Okuno et al. | Incorporating visual information into sound source separation |