Han et al., 2024 - Google Patents
Unsupervised multi-channel separation and adaptationHan et al., 2024
View PDF- Document ID
- 185813939013907697
- Author
- Han C
- Wilson K
- Wisdom S
- Hershey J
- Publication year
- Publication venue
- ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
External Links
Snippet
A key challenge in machine learning is to generalize from training data to an application domain of interest. This work extends the recently-proposed mixture invariant training (MixIT) algorithm to perform unsupervised learning in the multi-channel setting. We use MixIT to …
- 230000006978 adaptation 0 title abstract description 4
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/26—Recognition of special voice characteristics, e.g. for use in lie detectors; Recognition of animal voices
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Multi-microphone complex spectral mapping for utterance-wise and continuous speech separation | |
Zhang et al. | ADL-MVDR: All deep learning MVDR beamformer for target speech separation | |
Wisdom et al. | Unsupervised sound separation using mixture invariant training | |
Wang et al. | Voicefilter: Targeted voice separation by speaker-conditioned spectrogram masking | |
Luo et al. | Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation | |
Sainath et al. | Multichannel signal processing with deep neural networks for automatic speech recognition | |
Tan et al. | Audio-visual speech separation and dereverberation with a two-stage multimodal network | |
Pandey et al. | TPARN: Triple-path attentive recurrent network for time-domain multichannel speech enhancement | |
Zhang et al. | On end-to-end multi-channel time domain speech separation in reverberant environments | |
JP6622159B2 (en) | Signal processing system, signal processing method and program | |
Han et al. | Unsupervised multi-channel separation and adaptation | |
Zhang et al. | Multi-channel multi-frame ADL-MVDR for target speech separation | |
Tesch et al. | Multi-channel speech separation using spatially selective deep non-linear filters | |
Sivaraman et al. | Adapting speech separation to real-world meetings using mixture invariant training | |
von Neumann et al. | Graph-PIT: Generalized permutation invariant training for continuous separation of arbitrary numbers of speakers | |
Hussain et al. | Ensemble hierarchical extreme learning machine for speech dereverberation | |
Taherian et al. | Multi-channel conversational speaker separation via neural diarization | |
Subramanian et al. | Student-teacher learning for BLSTM mask-based speech enhancement | |
Wang et al. | UNSSOR: Unsupervised neural speech separation by leveraging over-determined training mixtures | |
Sahidullah et al. | The speed submission to DIHARD II: Contributions & lessons learned | |
Grondin et al. | Gev beamforming supported by doa-based masks generated on pairs of microphones | |
Liu et al. | Iterative deep neural networks for speaker-independent binaural blind speech separation | |
Aralikatti et al. | Reverberation as supervision for speech separation | |
Liang et al. | Attention-based multi-channel speaker verification with ad-hoc microphone arrays | |
Ocal et al. | Adversarially trained autoencoders for parallel-data-free voice conversion |