Han et al., 2024 - Google Patents

Unsupervised multi-channel separation and adaptation

Han et al., 2024

Document ID: 185813939013907697
Author: Han C; Wilson K; Wisdom S; Hershey J
Publication year: 2024
Publication venue: ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

External Links

Cited by

Snippet

A key challenge in machine learning is to generalize from training data to an application domain of interest. This work extends the recently-proposed mixture invariant training (MixIT) algorithm to perform unsupervised learning in the multi-channel setting. We use MixIT to …

Continue reading at arxiv.org (PDF) (other versions)

230000006978 adaptation 0 title abstract description 4

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition

Similar Documents

Publication	Publication Date	Title
Wang et al.	2021	Multi-microphone complex spectral mapping for utterance-wise and continuous speech separation
Zhang et al.	2021	ADL-MVDR: All deep learning MVDR beamformer for target speech separation
Wisdom et al.	2020	Unsupervised sound separation using mixture invariant training
Wang et al.	2018	Voicefilter: Targeted voice separation by speaker-conditioned spectrogram masking
Luo et al.	2019	Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation
Sainath et al.	2017	Multichannel signal processing with deep neural networks for automatic speech recognition
Tan et al.	2020	Audio-visual speech separation and dereverberation with a two-stage multimodal network
Pandey et al.	2022	TPARN: Triple-path attentive recurrent network for time-domain multichannel speech enhancement
Zhang et al.	2020	On end-to-end multi-channel time domain speech separation in reverberant environments
JP6622159B2 (en)	2019-12-18	Signal processing system, signal processing method and program
Han et al.	2024	Unsupervised multi-channel separation and adaptation
Zhang et al.	2021	Multi-channel multi-frame ADL-MVDR for target speech separation
Tesch et al.	2023	Multi-channel speech separation using spatially selective deep non-linear filters
Sivaraman et al.	2022	Adapting speech separation to real-world meetings using mixture invariant training
von Neumann et al.	2021	Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers
Hussain et al.	2019	Ensemble hierarchical extreme learning machine for speech dereverberation
Taherian et al.	2024	Multi-channel conversational speaker separation via neural diarization
Subramanian et al.	2018	Student-teacher learning for BLSTM mask-based speech enhancement
Wang et al.	2023	UNSSOR: Unsupervised neural speech separation by leveraging over-determined training mixtures
Sahidullah et al.	2019	The speed submission to DIHARD II: Contributions & lessons learned
Grondin et al.	2020	Gev beamforming supported by doa-based masks generated on pairs of microphones
Liu et al.	2018	Iterative deep neural networks for speaker-independent binaural blind speech separation
Aralikatti et al.	2023	Reverberation as supervision for speech separation
Liang et al.	2021	Attention-based multi-channel speaker verification with ad-hoc microphone arrays
Ocal et al.	2019	Adversarially trained autoencoders for parallel-data-free voice conversion