Morales-Cordovilla et al., 2014 - Google Patents

Distant speech recognition in reverberant noisy conditions employing a microphone array

Morales-Cordovilla et al., 2014

Document ID: 2021858435034491912
Author: Morales-Cordovilla J; Hagmüller M; Pessentheiner H; Kubin G
Publication year: 2014
Publication venue: 2014 22nd European Signal Processing Conference (EUSIPCO)

External Links

Cited by

Snippet

This paper addresses the problem of distant speech recognition in reverberant noisy conditions employing a microphone array. We present a prototype system that can segment the utterances in real-time and generate robust ASR results off-line. The segmentation is …

Continue reading at www.academia.edu (PDF) (other versions)

230000011218 segmentation 0 abstract description 11

Classifications

- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. hidden Markov models [HMMs]
- G10L15/142—Hidden Markov Models [HMMs]
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones

Similar Documents

Publication	Publication Date	Title
EP3707716B1 (en)	2021-12-01	Multi-channel speech separation
Vincent et al.	2013	The second ‘CHiME’speech separation and recognition challenge: An overview of challenge systems and outcomes
Kumatani et al.	2011	Channel selection based on multichannel cross-correlation coefficients for distant speech recognition
Hori et al.	2015	The MERL/SRI system for the 3rd CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition
Minhua et al.	2019	Frequency domain multi-channel acoustic modeling for distant speech recognition
Mitra et al.	2014	Evaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions
DEREVERBERATION et al.	2014	REVERB Workshop 2014
Matassoni et al.	2002	Hidden Markov model training with contaminated speech material for distant-talking speech recognition
Matassoni et al.	2014	The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones
KR20210137146A (en)	2021-11-17	Speech augmentation using clustering of queues
Bohlender et al.	2021	Neural networks using full-band and subband spatial features for mask based source separation
Gu et al.	2024	Rezero: Region-customizable sound extraction
Yamakawa et al.	2011	Environmental sound recognition for robot audition using matching-pursuit
US11528571B1 (en)	2022-12-13	Microphone occlusion detection
Morales-Cordovilla et al.	2014	Distant speech recognition in reverberant noisy conditions employing a microphone array
Varela et al.	2011	Combining pulse-based features for rejecting far-field speech in a HMM-based voice activity detector
Xiong et al.	2018	Channel selection using neural network posterior probability for speech recognition with distributed microphone arrays in everyday environments
Yoshioka et al.	2013	Noise model transfer: Novel approach to robustness against nonstationary noise
Rodomagoulakis et al.	2013	Experiments on far-field multichannel speech processing in smart homes
Mitra et al.	2014	Deep convolutional nets and robust features for reverberation-robust speech recognition
Kindt et al.	2022	Improved separation of closely-spaced speakers by exploiting auxiliary direction of arrival information within a u-net architecture
Hu et al.	2015	Single-channel speaker diarization based on spatial features
Morales-Cordovilla et al.	2014	Room localization for distant speech recognition.
Morales-Cordovilla et al.	2013	A German distant speech recognizer based on 3D beamforming and harmonic missing data mask
Xue et al.	2018	A study on improving acoustic model for robust and far-field speech recognition