[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Han et al., 2024 - Google Patents

Unsupervised multi-channel separation and adaptation

Han et al., 2024

View PDF
Document ID
185813939013907697
Author
Han C
Wilson K
Wisdom S
Hershey J
Publication year
Publication venue
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

External Links

Snippet

A key challenge in machine learning is to generalize from training data to an application domain of interest. This work extends the recently-proposed mixture invariant training (MixIT) algorithm to perform unsupervised learning in the multi-channel setting. We use MixIT to …
Continue reading at arxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/26Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition

Similar Documents

Publication Publication Date Title
Wang et al. Multi-microphone complex spectral mapping for utterance-wise and continuous speech separation
Zhang et al. ADL-MVDR: All deep learning MVDR beamformer for target speech separation
Wisdom et al. Unsupervised sound separation using mixture invariant training
Wang et al. Voicefilter: Targeted voice separation by speaker-conditioned spectrogram masking
Luo et al. Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation
Sainath et al. Multichannel signal processing with deep neural networks for automatic speech recognition
Tan et al. Audio-visual speech separation and dereverberation with a two-stage multimodal network
Pandey et al. TPARN: Triple-path attentive recurrent network for time-domain multichannel speech enhancement
Zhang et al. On end-to-end multi-channel time domain speech separation in reverberant environments
JP6622159B2 (en) Signal processing system, signal processing method and program
Han et al. Unsupervised multi-channel separation and adaptation
Zhang et al. Multi-channel multi-frame ADL-MVDR for target speech separation
Tesch et al. Multi-channel speech separation using spatially selective deep non-linear filters
Sivaraman et al. Adapting speech separation to real-world meetings using mixture invariant training
von Neumann et al. Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers
Hussain et al. Ensemble hierarchical extreme learning machine for speech dereverberation
Taherian et al. Multi-channel conversational speaker separation via neural diarization
Subramanian et al. Student-teacher learning for BLSTM mask-based speech enhancement
Wang et al. UNSSOR: Unsupervised neural speech separation by leveraging over-determined training mixtures
Sahidullah et al. The speed submission to DIHARD II: Contributions & lessons learned
Grondin et al. Gev beamforming supported by doa-based masks generated on pairs of microphones
Liu et al. Iterative deep neural networks for speaker-independent binaural blind speech separation
Aralikatti et al. Reverberation as supervision for speech separation
Liang et al. Attention-based multi-channel speaker verification with ad-hoc microphone arrays
Ocal et al. Adversarially trained autoencoders for parallel-data-free voice conversion