[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Nakagawa et al., 1999 - Google Patents

Using vision to improve sound source separation

Nakagawa et al., 1999

View PDF
Document ID
4167630125237943999
Author
Nakagawa Y
Okuno H
Kitano H
et al.
Publication year
Publication venue
PROCEEDINGS OF THE NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE

External Links

Snippet

We present a method of improving sound source separation using vision. The sound source separation is an essential function to accomplish auditory scene understanding by separating stream of sounds generated from multiple sound sources. By separating a stream …
Continue reading at cdn.aaai.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Similar Documents

Publication Publication Date Title
Nakadai et al. Active audition for humanoid
US6967455B2 (en) Robot audiovisual system
Nakagawa et al. Using vision to improve sound source separation
Lang et al. Providing the basis for human-robot-interaction: A multi-modal attention system for a mobile robot
Nakadai et al. Real-time sound source localization and separation for robot audition.
EP1818909B1 (en) Voice recognition system
CN111833899B (en) Voice detection method based on polyphonic regions, related device and storage medium
Mizumoto et al. Design and implementation of selectable sound separation on the Texai telepresence system using HARK
US20090030552A1 (en) Robotics visual and auditory system
Maganti et al. Speech enhancement and recognition in meetings with an audio–visual sensor array
Nakadai et al. Real-time speaker localization and speech separation by audio-visual integration
Bub et al. Knowing who to listen to in speech recognition: Visually guided beamforming
Nakadai et al. Epipolar geometry based sound localization and extraction for humanoid audition
Khan et al. Video-aided model-based source separation in real reverberant rooms
JP3632099B2 (en) Robot audio-visual system
Ban et al. Exploiting the complementarity of audio and visual data in multi-speaker tracking
Nakadai et al. Real-time tracking of multiple sound sources by integration of in-room and robot-embedded microphone arrays
Tesch et al. Multi-channel speech separation using spatially selective deep non-linear filters
Brandstein et al. Microphone‐array localization error estimation with application to sensor placement
Okuno et al. Computational auditory scene analysis and its application to robot audition
Nakadai et al. Exploiting auditory fovea in humanoid-human interaction
Okuno et al. Sound and visual tracking for humanoid robot
JP3843743B2 (en) Robot audio-visual system
Okuno et al. Robot audition: Missing feature theory approach and active audition
Okuno et al. Incorporating visual information into sound source separation