[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

Santos et al., 2024 - Google Patents

Exploring the Potential of Data-Driven Spatial Audio Enhancement Using a Single-Channel Model

Santos et al., 2024

View PDF
Document ID
16544385399067116183
Author
Santos A
Masiero B
Mateus T
Publication year
Publication venue
arXiv preprint arXiv:2404.14564

External Links

Snippet

One key aspect differentiating data-driven single-and multi-channel speech enhancement and dereverberation methods is that both the problem formulation and complexity of the solutions are considerably more challenging in the latter case. Additionally, with limited …
Continue reading at arxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signal analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signal, using source filter models or psychoacoustic analysis
    • G10L19/008Multichannel audio signal coding or decoding, i.e. using interchannel correlation to reduce redundancies, e.g. joint-stereo, intensity-coding, matrixing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00-G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/065Adaptation
    • G10L15/07Adaptation to the speaker
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction

Similar Documents

Publication Publication Date Title
Ochiai et al. Beam-TasNet: Time-domain audio separation network meets frequency-domain beamformer
Zhang et al. Deep learning based binaural speech separation in reverberant environments
Sainath et al. Multichannel signal processing with deep neural networks for automatic speech recognition
Barker et al. The third ‘CHiME’speech separation and recognition challenge: Dataset, task and baselines
Guizzo et al. L3DAS22 challenge: Learning 3D audio sources in a real office environment
Schädler et al. Separable spectro-temporal Gabor filter bank features: Reducing the complexity of robust features for automatic speech recognition
Grais et al. Raw multi-channel audio source separation using multi-resolution convolutional auto-encoders
Roman et al. Binaural segregation in multisource reverberant environments
Lu et al. ESPnet-SE++: Speech enhancement for robust speech recognition, translation, and understanding
Dadvar et al. Robust binaural speech separation in adverse conditions based on deep neural network with modified spatial features and training target
Wang et al. Count and separate: Incorporating speaker counting for continuous speaker separation
Wang et al. Localization based sequential grouping for continuous speech separation
Pertilä Online blind speech separation using multiple acoustic speaker tracking and time–frequency masking
Marti et al. Automatic speech recognition in cocktail-party situations: A specific training for separated speech
Lee et al. Improved mask-based neural beamforming for multichannel speech enhancement by snapshot matching masking
Santos et al. Exploring the Potential of Data-Driven Spatial Audio Enhancement Using a Single-Channel Model
CN117711422A (en) Underdetermined voice separation method and device based on compressed sensing space information estimation
Kuang et al. Three-stage hybrid neural beamformer for multi-channel speech enhancement
Gul et al. Preserving the beamforming effect for spatial cue-based pseudo-binaural dereverberation of a single source
Kindt et al. Improved separation of closely-spaced speakers by exploiting auxiliary direction of arrival information within a u-net architecture
He et al. Mask-based blind source separation and MVDR beamforming in ASR
Cobos et al. Two-microphone separation of speech mixtures based on interclass variance maximization
Sun et al. A two-stage single-channel speaker-dependent speech separation approach for chime-5 challenge
Do Subband temporal envelope features and data augmentation for end-to-end recognition of distant conversational speech
Ideli Audio-visual speech processing using deep learning techniques