Santos et al., 2024 - Google Patents
Exploring the Potential of Data-Driven Spatial Audio Enhancement Using a Single-Channel ModelSantos et al., 2024
View PDF- Document ID
- 16544385399067116183
- Author
- Santos A
- Masiero B
- Mateus T
- Publication year
- Publication venue
- arXiv preprint arXiv:2404.14564
External Links
Snippet
One key aspect differentiating data-driven single-and multi-channel speech enhancement and dereverberation methods is that both the problem formulation and complexity of the solutions are considerably more challenging in the latter case. Additionally, with limited …
- 238000000034 method 0 abstract description 15
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
- G10L19/008—Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/065—Adaptation
- G10L15/07—Adaptation to the speaker
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R3/00—Circuits for transducers, loudspeakers or microphones
- H04R3/005—Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04S—STEREOPHONIC SYSTEMS
- H04S2400/00—Details of stereophonic systems covered by H04S but not provided for in its groups
- H04S2400/15—Aspects of sound capture and related signal processing for recording or reproduction
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ochiai et al. | Beam-TasNet: Time-domain audio separation network meets frequency-domain beamformer | |
Zhang et al. | Deep learning based binaural speech separation in reverberant environments | |
Sainath et al. | Multichannel signal processing with deep neural networks for automatic speech recognition | |
Barker et al. | The third ‘CHiME’speech separation and recognition challenge: Dataset, task and baselines | |
Guizzo et al. | L3DAS22 challenge: Learning 3D audio sources in a real office environment | |
Schädler et al. | Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition | |
Grais et al. | Raw multi-channel audio source separation using multi-resolution convolutional auto-encoders | |
Roman et al. | Binaural segregation in multisource reverberant environments | |
Lu et al. | ESPnet-SE++: Speech enhancement for robust speech recognition, translation, and understanding | |
Dadvar et al. | Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target | |
Wang et al. | Count and separate: Incorporating speaker counting for continuous speaker separation | |
Wang et al. | Localization based sequential grouping for continuous speech separation | |
Pertilä | Online blind speech separation using multiple acoustic speaker tracking and time–frequency masking | |
Marti et al. | Automatic speech recognition in cocktail-party situations: A specific training for separated speech | |
Lee et al. | Improved mask-based neural beamforming for multichannel speech enhancement by snapshot matching masking | |
Santos et al. | Exploring the Potential of Data-Driven Spatial Audio Enhancement Using a Single-Channel Model | |
CN117711422A (en) | Underdetermined voice separation method and device based on compressed sensing space information estimation | |
Kuang et al. | Three-stage hybrid neural beamformer for multi-channel speech enhancement | |
Gul et al. | Preserving the beamforming effect for spatial cue-based pseudo-binaural dereverberation of a single source | |
Kindt et al. | Improved separation of closely-spaced speakers by exploiting auxiliary direction of arrival information within a u-net architecture | |
He et al. | Mask-based blind source separation and MVDR beamforming in ASR | |
Cobos et al. | Two-microphone separation of speech mixtures based on interclass variance maximization | |
Sun et al. | A two-stage single-channel speaker-dependent speech separation approach for chime-5 challenge | |
Do | Subband temporal envelope features and data augmentation for end-to-end recognition of distant conversational speech | |
Ideli | Audio-visual speech processing using deep learning techniques |